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Abstract 


Many heap-oriented languages such as Lisp and Id depend on run-time garbage collection to re- 
claim storage. Garbage collection can be a significant run-time expense, especially for functional 
languages that tend to allocate structures often. Compiler-directed storage reclamation reduces 
the run-time overhead of garbage collection by having the compiler insert deallocation code. Com- 
pilers must perform object lifetime analysis in order to insert storage reclamation code. Current 
approaches to lifetime analysis assume a strict or sequential interpreter. 


We formulate an operational semantics for a parallel, non-strict language in order to precisely 
define when it is safe to deallocate an object. Our operational semantics yields exact information 
about what objects are allocated, deallocated, and referenced at any point during the execution of 
a program. Using this information, we define precise run-time conditions that must be met by safe 
deallocation commands. 


We use abstract interpretation to yield at compile-time a summary of what objects are allocated 
and reachable at any point in a program. We define static conditions that must be met by safe 
deallocation commands. We then define an algorithm that uses the abstract interpreter to verify the 
safety of deallocation commands already in programs and an algorithm to insert safe deallocation 
commands into programs. 


We describe our implementation of the lifetime analysis, the verification algorithm, and the insertion 
algorithm. We the discuss the effectiveness of the compiler at verifying and inserting deallocation 
commands in several medium-sized Id programs. We also discuss the performance of each program 
in terms of storage allocated and reclaimed. Our implementation is quite effective for programs 
with simple patterns of sharing between objects. 
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Chapter 1 


Introduction 


Many modern programming languages are oriented towards dynamic storage management. Lisp- 
like languages and functional languages such as Id [35, 33, 34] are heap-oriented languages. In 
these languages the storage for arrays and other aggregate objects is not necessarily associated 
with the invocation of the particular procedure that allocated the object. Storage for an aggregate 
is allocated on a heap when the aggregate is created so objects can last longer than the procedure 
that created them. The storage in which an object resides can only be reclaimed when the program 
no longer uses the object, where a use of an object is a reference to the contents of the object. The 
lifetime of an object is the period of time from the object’s allocation until the time when the last 
reference is made to the object. 


Storage in heap-oriented languages is often reclaimed implicitly, by a garbage collector. A garbage 
collector intermittently or incrementally traverses the heap, stacks, and other program data struc- 
tures to determine which objects are reachable by the program, and then clears all other objects 
(the garbage) from the heap. 


The standard alternative to garbage collection, explicit storage management, requires the pro- 
grammer to insert commands to reclaim storage so that the program does not run out of memory. 
Explicitly deallocating structures is often an extremely error-prone process, because it is not always 
clear where a structure is passed and when it is no longer referenced. Furthermore, changes to a 
program can cause deallocation commands to become incorrect. 


It is easier to develop correct programs when using implicitly managed storage, because the pro- 
grammer does not have to worry about storage management. Unfortunately, implicit storage man- 
agement is typically more expensive than explicit storage management. One source of overhead in 
implicit storage management is the determination of which storage is no longer in use. Another 
source of overhead is that garbage collected systems typically use more storage at any point in time 
than explicitly managed systems because they reserve a significant fraction of memory (up to half) 
for use solely during the garbage collection process. Furthermore, storage is usually not reclaimed 
as soon as it ceases to be used in a garbage collected system, and so more storage is allocated at 
any point in time than in an explicitly managed system. 


In this thesis, we present a third possibility: implicitly managed storage without all of the run-time 
overhead of garbage collector managed storage. We accomplish this by having the compiler analyze 
programs and insert storage deallocation commands, thus lifting the much of the burden of storage 
management from the programmer. The compiler will not be able to pick up all of the garbage, 
and so the rest will have to be handled by the programmer or by a run-time garbage collector. In 
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this thesis we develop and evaluate a method of static analysis of programs to determine object 
lifetimes. Our goal is to determine if this analysis is useful and to determine to what extent storage 
management can be done by the compiler. General issues of garbage collection and heap-allocation 
algorithms are orthogonal to this work. 


This thesis makes a number of contributions to the area of lifetime analysis and to the practice of 
static program analysis. First, it develops a general framework for the lifetime analysis of a parallel 
language using the theory of abstract interpretation. Second, it defines abstract representations for 
a variety of data types, including tuples, arrays, algebraic types, recursive types, and higher-order 
functions. Third, it attempts to characterize the costs and effectiveness of these techniques when 
applied to real programs. 


Past work in lifetime analysis has mostly been on sequential languages. We know of no work which 
performs lifetime analysis on either sequential or parallel non-strict languages. The lifetime analysis 
method described in this thesis applies to parallel, strongly-typed, single-assignment languages. 
Slight variations in the methods we use allow us to analyze either strict or non-strict programs. 
We present the work in terms of a non-strict language and discuss the changes necessary to apply 
our methods to a strict language. 


Our main goal was to develop a framework for lifetime analysis and to determine its effectiveness. 
In fact, we have developed a general framework for abstract interpretation of parallel and non- 
strict languages with a rich variety of types. This abstract interpreter could be used to perform 
interference analysis or even strictness analysis instead of lifetime analysis, although we have not 
pursued these topics. We do consider a limited form of sharing analysis to determine if the elements 
of arrays may be shared. We also discuss extensions to the lifetime analysis of recursive types that 
would allow us to determine whether objects form directed acyclic graphs or trees. 


We have found our implementation of these methods to be quite effective in determining the lifetimes 
of objects in real Id programs. We have implemented this work as part of the Id compiler [40] and 
applied it to several programs of 100 to 1000 lines. The implementation is structured to support 
separate compilation — a program can be compiled in bottom up fashion and each procedure is 
verified /transformed individually. The augmented compiler was able to compile these programs into 
object code that deallocated 80 to 100 percent of the total storage that they allocated, at a cost 
of a factor of 1.5 to 5 increase in compile-times. We have also found that although a programmer 
could insert all of the deallocation code that the compiler inserted, it would require a major change 
in programming style to do so. 


1.1 Thesis Overview 


Before we can develop a lifetime analysis algorithm, we must have a well-defined notion of the 
lifetime of an object. We defined the lifetime of an object to be the period of time during the 
execution of a program from when the object was allocated until the object was no longer referenced. 
Lifetime analysis is the process of determining the range of program points during which the object 
bound to a particular variable may be referenced by a program. Thus, lifetime analysis is intimately 
related to the operational semantics of a program. 


In this thesis, we first develop an operational semantics for a non-strict, parallel language. The 
operational semantics is defined by an interpreter that gives the standard semantics of a program 
in terms of its behavior. We then use this semantics to define the lifetimes of objects allocated 
by programs. Our definition of object lifetimes is exact, but can only be determined at run-time, 
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as a program is executing. We also use this interpreter to define when deallocation commands 
are correct. Correct deallocation commands never lead to run-time errors in which an object is 
deallocated before the end of its lifetime. 


Because we must be able to determine object lifetimes at compile-time, we need some way of 
converting this exact, run-time notion of object lifetimes into a compile-time property. We use an 
abstraction of the standard interpreter to give an approximation of the behavior of the program. 
The purpose of the abstract interpreter is to generate an approximation of a program’s behavior 
over all input data, and, in the case of parallel programs, over all execution orders. In addition, the 
abstract interpreter must be decidable — we must be able to compute this approximate behavior 
in a finite amount of time. 


Given the abstract interpretation of a program, we show how to compute approximate object 
lifetimes. We are willing to use approximate object lifetimes in order to develop an algorithm that 
terminates, as long as the approximations are all safe. A safe approximation of an object’s lifetime 
is guaranteed to include the actual lifetime of an instance of that object at run-time. 


We present two algorithms that use the information about object lifetimes. The first algorithm 
verifies that the deallocation commands in a program are all safe. The second algorithm inserts safe 
deallocation commands into programs automatically. This second algorithm allows the programmer 
to write programs in which storage is implicitly reclaimed. 


For the remainder of this thesis, we talk about the language KID~ [3], a specific parallel, non-strict, 
single-assignment language with higher-order functions. KID, or Kernel Id, is an intermediate 
language developed by Ariola and Arvind to express the semantics of Id [35, 33, 34] and to express 
the compilation of Id programs. In this thesis, we consider KID~ to be KID without higher-order 
functions and M-structures [7] (structures with per-element mutual exclusion). 


In Chapter 2 we present the syntax and standard semantics of KID~ and we discuss the unusual 
evaluation strategy used by the KID7~ interpreter. In Chapter 3 we develop an augmented, or 
instrumented, interpreter that allows us to define exactly when deallocation commands are correct 
and incorrect. 


In Chapters 4 and 5 we restrict KID~ programs to operate only on tuples, numbers, and booleans. 
In the first of these chapters we develop an abstracted interpreter for KID~ and show that it is safe 
with respect to the instrumented interpreter. Our definition of safety is that object reachability 
must be preserved by the abstract interpreter. In the second of these chapters we use the abstracted 
interpreter to give an algorithm for verifying safe deallocation commands and an algorithm for 
inserting deallocation commands. In Chapter 6 we discuss a method for improving the effectiveness 
of the lifetime analysis by improving the abstract interpreter. 


In Chapters 7, 8, and 9 we describe the additions to the abstract interpreter necessary to handle 
arrays, algebraic types, recursive types, and higher-order functions. 


In Chapter 10, we describe our implementation of the deallocation command verification and in- 
sertion algorithms and their effectiveness on several programs. Finally, we give our conclusions on 
this work in Chapter 11. 


The remainder of this chapter gives some background on lifetime analysis and storage management. 
Section 1.2 describes previous work relating to lifetime analysis. Section 1.3 describes the assump- 
tions we make about storage management. Section 1.4 compares the cost of garbage collection 
with the cost of explicit storage management. Section 1.5 describes explicit storage management in 
Id programs. Finally, Section 1.6 describes the safety condition that must be met by deallocation 
commands in Id programs. 
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1.2. Background 


This section gives some background on the problem of storage management. We start by describing 
the various storage management strategies, and then we go into more detail about the techniques 
used in implicit storage management. There are several ways in which each technique could be 
classified, and so the division of techniques into groups is somewhat arbitrary. 


The problem of storage management has existed since the first computer program was written. In 
early programming languages such as Fortran, storage is statically allocated by the programmer. 
Under the static management paradigm, the programmer or compiler allocates storage for all 
structures by creating a memory map that places each object in a fixed position. In early Fortran 
implementations, all procedure activations and all data structures were statically allocated. In 
modern computer languages, some data structures may be statically allocated. There is no direct 
run-time cost for the management of statically allocated storage — this all occurs at compile-time 
when the memory map is constructed. 


Static allocation is not always possible: the activation frames for recursive procedures cannot be 
statically allocated. A separate activation frame must be allocated for each recursive procedure 
invocation. For this reason, procedure activation frames, including storage for procedure-local 
objects, are usually stack managed. Temporary structures, declared locally in procedures, may 
also be stack allocated. 


Under stack management, storage is managed by having a pointer to the next word to be allocated, 
incrementing this pointer to allocate storage and decrementing this pointer to deallocate storage. 
Under stack discipline, objects must be deallocated in the reverse order from which they were 
allocated (last-in-first-out). Stack management allows greater flexibility than static allocation, 
because the number and size of objects does not have to be known at compile-time. 


Objects allocated on the same stack as activation frames are automatically deallocated when the 
procedure that allocated them returns to its parent. If a pointer to this object is returned to 
the parent procedure, the parent may attempt to refer to the contents of a defunct structure. 
This scenario is known as the dangling pointer problem. The danger is that the object may be 
overwritten when another procedure call is made, and that the parent will thereafter read spurious 
data. 


Sometimes it is necessary for objects to survive longer than the procedures that allocated them. 
In this case, they must be handled by a heap management algorithm that allows objects to be 
allocated and deallocated in an arbitrary order. Heap management increases the expressiveness of 
a language but complicates the storage manager by making it more expensive computationally. The 
heap manager must keep track of which storage is in use and which storage is free to be allocated, 
while trying to minimize wasted storage due to mismatches between the sizes of objects requested 
and the sizes of objects actually allocated. 


In many languages that support heap allocation, such as C, both allocation and deallocation must 
be specified by the programmer. If the programmer does not deallocate structures that are no 
longer needed, then the program consumes an inordinate amount of memory, possibly causing the 
program to fail. If the programmer deallocates an object too soon, then the program may behave 
incorrectly due to a dangling pointer error. 


The explicit deallocation of structures whose lifetime is not tied to that of a procedure invocation 
is difficult — the programmer must ensure that no more references to an object are made anywhere 
in the program. The difficulty is increased if the pattern of sharing among objects is complex. 
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For this reason, many heap-oriented languages have run-time system support to automatically (or 
implicitly) deallocate storage that is no longer in use. 


In heap-oriented languages such as Lisp, storage management is implicit: all structures allocated 
are typically allocated on a heap, and the run-time system automatically takes care of deallocating 
structures that are no longer accessible from the program. Structures that are not accessible from 
the program are called garbage. The garbage collector, a part of the run-time system, periodically 
scans the heap and invocation stacks, and finds and reclaims all unreachable objects. In general, 
garbage collection is more expensive than explicit heap management; in addition to reclaiming 
storage the garbage collector must determine which objects are garbage. The benefit from using 
a garbage collected system is that the user does not have to worry about not deallocating enough 
storage or about deallocating storage too early. 


1.2.1 Lifetime Analysis 


Lifetime analysis was first suggested by Barth [5] as an optimization to shift some of the run-time 
overhead of garbage collection to compile-time. His approach is to take Lisp programs that had 
reference counting code inserted, and to use dataflow (live variable) analysis to determine that 
a particular variable in the program will always be associated at run-time with a structure with 
reference count 1. When a variable is determined to be dead, then code can be inserted to free 
the associated structure. Barth also discusses several local transformations that optimize reference 
counting code inserted by the compiler. Although his method only inserts deallocation code if it 
determines that there is exactly one reference to a structure, he claims that this optimization is 
powerful enough to reclaim a significant amount of temporary storage in Lisp programs, because 
studies by Clark [11] show that most structures in Lisp programs are referred to exactly once. 


1.2.2 Dataflow Analysis 


Barth’s method was limited because the analysis could not follow pointers or procedure calls. There 
have been several approaches that attempt to solve these problems. 


Ruggieri and Murtagh [39] developed an interprocedural lifetime analysis framework for a statically 
typed, monomorphic language. Their algorithm computes the set of object sources which may be 
bound to each variable before each statement in the program is executed. They represent nested 
objects as subvariables, with labeled edges connecting variables with the contents of their various 
fields. Recursively typed objects have a potentially infinite number of subvariables; so Ruggieri and 
Murtagh introduce an operator that summarizes an infinite graph of subvariables by one in which 
the longest path is bounded by n, where n is a parameter of the analysis. 


Larus and Hilfinger [29] developed an analysis similar to Ruggieri’s which computes the possible 
aliases between structure accesses. They show how to use standard dataflow techniques to compute 
their alias graphs. They also show that precise computation of alias relations in a single function 
is NP-complete. 


Hendren and Nicolau [20] take a different approach to solving the finite representation problem. 
They define an analysis framework that uses path matrices to do interference analysis for par- 
allelization. These path matrices show the paths of possible interference between two successive 
program points. Each element of a path matrix uses a regular expression of field names to name an 
access path through a recursively typed object. This naming scheme guarantees that access paths 
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are of finite size. Hendren and Nicolau’s method automatically detects non-shared lists and trees 
in an imperative language. The interference analysis Hendren and Nicolau developed can be recast 
as a lifetime analysis by determining all the statements from which a given structure is reachable 
— the control region bounded by those statements bounds the lifetime of the structure. 


Chase, Wegman and Zadeck [10] attempt to improve the method by which information about data 
structures is summarized. Their method takes programs in static single assignment form [14] and 
constructs a storage shape graph (SSG) that represents the interconnectedness of structures in the 
heap. Each node in the graph represents a structure allocated by a different allocation statement. 
The number of nodes in an SSG is bounded by the sum of the number of allocation statements and 
the number of variables in a program. Storage shape graphs are augmented with heap reference 
counting to determine the lifetime of a structure and to determine if a structure is acyclic. 


1.2.3 Analyses Based on Abstract Interpretation 


These techniques all consist of a set of ad hoc rules for analyzing programs. Cousot and Cousot [12] 
developed abstract interpretation, a method for simulating the execution of a program in order to 
determine the behavior of a program. The use of abstract interpretation allows the derivation of 
an analysis framework from the operational semantics of a programming language. 


Jones and Muchnik [27] used abstract interpretation to develop a general framework for interpro- 
cedural dataflow analysis of programs with recursive data structures. They extend the Cousots’ 
work on dataflow analysis of flowcharts to work with recursive data structures. They use tokens 
to provide local representations of lists. Tokens are labels derived from program states. Their flow 
analyzer constructs a retrieval function that takes a token and reconstructs the list or lists locally 
described by that token. This retrieval function is really an abstraction of a store, where a store 
maps locations to list values. 


Jones and Muchnik describe a version that analyzes a simple first-order language. This version uses 
node labels as tokens, and divides tokens into atoms and lists. Their general framework could be 
adapted to a variety of analyses by plugging in the appropriate domains and operational semantics. 
There is a great deal of freedom in choosing tokens. Tokens can be more specific, e.g., whole states, 
in which case the analysis will be more precise but computationally intractable, or more general, 
e.g., node labels, in which case the analysis will converge faster but give less precise information. 


Horwitz, Pfeiffer, and Reps [22] use the Jones and Muchnik framework to compute an abstraction 
of memory where each location is labeled by the program points that modify its contents. They 
show that their analysis is correct for all implementations of the underlying operational semantics. 
The framework of Horwitz et al does not do interprocedural analysis. 


One enhancement to this framework is the ability to handle higher-order functions. Deutsch [15] 
develops a static analysis method for determining the aliasing and lifetimes of objects in a strict, 
higher-order functional language with first class continuations. His work is also based on that of 
Jones and Muchnik. Deutsch presents a low-level operational semantics defined in terms of state 
transition rules, and abstracts this semantics to obtain an analysis algorithm. He uses complete 
program states to label objects uniquely in the standard semantics and uses an abstraction of 
program states to label objects in the abstract semantics. 


Rather than presenting a low-level operational semantics, Harrison [19] presents an analysis in 
terms of a high-level operational semantics for Scheme. Harrison develops an analysis that could 
be used to make storage management and parallelization decisions about Scheme programs with 
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first class continuations, side effects and higher-order functions. His work takes an approach similar 
to that of Jones and Muchnik. The correct modeling of control flow in the presence of continuations 
adds to the complexity of Harrison’s method. He uses procedure strings to name all points in the 
execution of a program. A procedure string consists of a sequence of symbols naming the procedure 
bodies that have been entered and exited along the execution path to a program point. Harrison 
models aggregate objects as higher-order functions. 


1.2.4 Analysis of Parallel Languages 


All of these techniques were developed for sequential programming languages, even though the orig- 
inal work on abstract interpretation was defined in terms of flow graphs, which are not necessarily 
sequential. Much of the work on abstract interpretation has been done on functional languages, 
which are often touted as being parallel languages. Even so, most of the work on lifetime analysis of 
functional languages has been done with respect to a sequential implementation. There have been 
a few approaches that do not assume a sequential implementation, which we will describe below. 


Hudak [23] describes an analysis based on abstract interpretation of a reference counting interpreter 
for a strict, functional language operating on arrays of numbers. Even though the language is 
functional, the denotational semantics he presents is sequential, because it performs side effects in 
the form of reference counting operations. 


Thomas Johnsson [26] developed an analysis method for modeling heap contents based on the 
framework of Jones and Muchnik. His analysis is to be used in optimizing graph reduction in- 
termediate code that resulted from compiling a lazy, functional language. Although the language 
being compiled is not sequential, the interpreter of the intermediate code is sequential. The in- 
termediate code is imperative and contains explicit code to construct and evaluate closures. The 
parallelism in the source language is simulated by interleaving execution of subexpressions in the 
intermediate code. 


Ranelletti [38] describes an analysis method on dataflow graphs representing parallel programs 
written in SISAL [16]. These dataflow graphs only give a partial order on the execution order of 
expressions in a program. This method allows the compiler to transform graphs so that storage 
is preallocated for arrays that are incrementally defined by a program. Preallocation reduces the 
number of arrays that need to be allocated and reduces the number of times array elements are 
copied from one array to another. Ranelletti’s method is very efficient — it takes O(n) compile- 
time, where n is the size of the program being analyzed. Unfortunately, extending it to handle 
interprocedural analysis will make it much less efficient — it will take O(2") compile-time. 


Cann [9] describes an analysis technique on SISAL dataflow graphs that allows arrays or array dope- 
vectors to be updated in place whenever it can be shown that the updater is the only consumer of 
the array. This method is also based on parallel programs. However, some of his transformation 
techniques add dependence edges that increase the sequentiality of the program in order to perform 
update-in-place optimizations. 

In addition to these graph-based approaches, there have been a number of abstract interpretation- 
based analysis frameworks that are interesting because they also do not assume a sequential inter- 


preter. The work by Young and O’Keefe [45] and the work by Aiken and Murphy [1] fall into this 
category. 


Young and O’Keefe developed a type evaluator for a lazy dialect of Scheme. This evaluator computes 
an approximation to the set of possible values to which each expression in a program could evaluate. 
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The only data structure that they considered was untyped pairs. In order to analyze recursive 
functions on lists, their analyzer approximated infinite sets of values by cyclic type representations. 
Although the evaluator described by Young and O’Keefe yields an approximation of the values 
computed by each expression in a program, it is not viable for use in lifetime analysis because there 
is no way to determine the sharing or reachability of objects from any expression in the program. 


Aiken and Murphy developed a similar type inferencer for the strict functional language FL. Their 
approach uses type expressions as the abstract value domain and a set of rewrite rules to give the 
operational semantics of FL. The language of type expressions includes a fiz operator that defines 
an infinite set of regular tree types by a finite representation. These recursive type expressions are 
used when deriving the type of recursive functions. 


Aiken and Murphy’s type inferencer uses the rewrite rules as constraints in a proof system to derive 
the types of FL expressions. In the case of recursive functions, heuristics must be used to choose 
which rewrite rule to apply, because more than one rewrite rule may be applicable to a given 
instance of a recursive function. 


Park and Goldberg [37] developed an analysis framework based on abstract interpretation of a 
higher-order functional language. Their framework computes an approximation of how much of a 
nested list value passed to a function escapes as part of the result of that function. They did not 
precisely define the standard semantics that they were abstracting. 


Jones and Le Métayer [28] developed three analyses framed as abstract interpretations of programs: 
sharing, transmission, and necessity analysis. These analyses are defined for an expression-oriented 
language with lists as the only data structure. Jones and Le Métayer did not state precisely the 
standard semantics corresponding to the abstract semantics used in the analyses, and so it is 
difficult to see how to generalize this method to other data structures. 


There are two problems with the last two approaches to determining object lifetimes. The first is 
that they do not have a good correspondence with any standard semantics. The point of methods 
based on abstract interpretation is that the analyses can be shown to be safe with respect to the 
standard semantics. The second problem is that objects are not named, and so the analyses fail if 
the source languages are made imperative or non-strict, because there is no way to handle cyclic 
structures in these frameworks. 


1.2.5 Analyses Based on Type Deduction 


There is one more semantics-based approach to analysis that defines the analysis in terms of type 
deduction or type checking using a non-standard type system. Lucassen and Gifford [31] define a 
type and effects system for the FX language [17] that can be used to determine the lifetimes of 
objects. FX-87, based on the second-order lambda-calculus, has a kind system consisting of type 
and effect annotations. The effect annotations describe which regions are allocated into, written 
to, or read from during the execution of an expression. Effect annotations on procedure values 
describe not only the effects incurred by evaluating the procedure value, but also the latent effects 
incurred by applying the procedure value to arguments. Lucassen and Gifford show how the effect 
descriptions can show that the lifetime of an object resulting from a particular expression has 
limited extent. 


This approach requires the user to annotate programs with type and effect declarations before the 
compiler can perform type and effect checking and lifetime analysis. Use of this approach would 
also allow the compiler to check the safety of explicit storage management in some cases. In later 
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work Gifford et al. [18] extend the FX compiler to perform type and effect deduction, but in this 
effect system they dropped the information about storage regions. It is unclear from the paper 
whether there is an efficient or decidable algorithm for deducing types and effects with regions for 
FX programs. 


Baker [4] describes states that the structure-sharing unification algorithm for Milner-style type 
inference already produce a certain amount of sharing information for functional languages. Each 
node that represents the type of an expression in a program corresponds to a set of run-time 
objects. In a functional language, distinct type nodes represent disjoint sets of run-time objects, 
while unified type nodes represent overlapping sets of run-time objects. The advantage of using 
type inference for sharing analysis is that the algorithms for type inference are efficient enough to 
be used in production compilers. The disadvantage of this approach is that it cannot be extended 
to imperative programming languages without greatly increasing the complexity of the analysis. 


1.3 Storage Management Assumptions 


Let us assume that objects allocated by a program can be placed either in the activation frame of 
the procedure that allocated the object or on an implicitly or explicitly managed heap. Objects 
placed in procedure activation frames are automatically deallocated when the procedure terminates; 
consequently, the lifetime of these objects must be bounded by the lifetime of the procedure. In 
our implementation of Id, only fixed size objects may be frame-allocated because a procedure’s 
activation frame cannot be extended once it has been allocated. 


We believe that the applications in which we are interested would suffer too much of a performance 
penalty if they depended solely on run-time garbage collection. One characteristic of these appli- 
cations is the use of large amounts of data, often held in large arrays. The behavior of garbage 
collectors in the presence of large, shallow or flat data structures is not well understood, but ap- 
plications typically manage these structures explicitly even though garbage collection is used to 
manage other structures. In these programs, most storage reclamation should be done explicitly, 
either by explicit deallocation or explicit reuse of structures. We would like to automate the process 
of explicitly managing these large structures. It is very difficult, and often impossible, for either a 
programmer or a compiler to explicitly reclaim all structures allocated, and so we will continue to 
have a garbage collector that reclaims the storage that cannot be reclaimed explicitly. 


This thesis does not explore the best ways for the heap manager and garbage collector to interact. 
The way explicit and implicit storage management interact depends to a large extent on the choice 
of garbage collection method and characteristics of the run-time system. One possibility is for the 
heap manager to allocate areas that are never garbage collected, and to use these areas for objects 
that are guaranteed to be deallocated eventually. The objects in these areas would never have to 
be copied by the garbage collector, and so we would save on the overhead of copying these objects. 
Another possibility is to use a reference counted garbage collector and to set the reference counts 
of objects whose lifetimes can be determined to one upon allocation and to zero when no longer 
needed, but not to perform reference counting operations on the objects otherwise. 


1.4 Cost of Storage Management 


It is not clear that a program will always have better performance running under an explicit storage 
manager than it will have running under a garbage collector. Appel [2] makes an argument that 
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garbage collection can be faster than explicit storage management; he claims that it can even be 
faster than stack allocation. Appel’s claim is that 


with enough memory on the computer, it is more expensive to explicitly free a cell than 
it is to leave it for the garbage collector — even if the cost of freeing a cell is only a 
single machine instruction. 


Appel gives the cost per reachable object of copying garbage collection as 


(cy + ¢98)A 


I= M/s—A (LD) 


where c, is the number of operations required per object copied, cg is the number of operations 
per pointer, s is the average size of an object, A is the number of reachable objects when garbage 
collection is performed, and M is the size of the two memory spaces. If M is made sufficiently large 
relative to the other parameters, then the cost per reachable, or non-garbage, object can be made 
arbitrarily small. 


In the limiting case as the amount available memory approaches infinity, Appel asserts that it is 
cheaper to rely on garbage collection than explicit storage management, even stack management, 
because the garbage collector will never have torun. At the other extreme, as the amount of memory 
approaches the average amount of memory in use at any time, the cost of garbage collection goes to 
infinity. In order to determine the crossover point where the cost of implicit memory management 
is less than the cost of explicit memory management, we must know the average amount of memory 
used by a program and the time constants c, and cz associate with garbage collection, relative to 
the cost of explicit storage management. 


Is it reasonable to assume, as Appel does, that we will be operating in the large-memory regime 
where the cost of garbage collection is insignificant? Although the cost per word of memory 
is continuously decreasing, the amount of memory needed for interesting problems seems to be 
increasing just as fast. It seems that the cost of garbage collection will be significant for the class of 
programs considered in this thesis because large programs will operate in the memory management 
regime where most of memory is in use and garbage collection is expensive. Nevertheless, the cost 
of explicitly allocating and deallocating an object by a general heap manager is very high, and so 
care must be taken to reduce the number of calls to the general heap manager. For this reason, 
we will consider some approaches to reusing storage directly or allocating objects in procedure 
activation frames. 


Appel does not consider the effect of locality on program execution time. Moon [32] states that 
the most important responsibility of a garbage collector in a system using virtual-memory is to 
keep data structures local; actually reclaiming storage is a secondary responsibility in this case. If 
a program has little locality of reference because it uses objects spread over a very large amount 
of memory, then the performance of the program will be very poor if the virtual-memory system 
thrashes. 


Is there some way for explicit storage management to cooperate with garbage collection? Many of 
the strict, functional languages use a reference counting garbage collector because these languages 
cannot create cyclic data structures. If a reference counting garbage collector is used, then reference 
counting of objects whose lifetime is known need not be performed. The reference count will be 
set to one when the object is created and set to zero when the object’s lifetime is over. The Id 
run time system is likely to use a copying garbage collector, so that it can reclaim circular objects. 
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def fo () = 
{ x = MakeTuple(6,847); 
r = Select,(x); 


Dealloc(x) 
in r } 


Figure 1.1: Dealloc in a block expression 


A explicit storage allocation and deallocation can cooperate with a copying garbage collector by 
allocating and deallocating objects in a separate region. Garbage collection will not be performed 
on this region until all objects within it are garbage, in which case, the region may be used as free 
storage. Alternatively, this region can be treated as an older generation, and garbage collected 
infrequently, with promotion suppressed. 


1.5 Explicit Storage Deallocation in Id 


The first step in this work was to allow programmers to perform explicit storage management in 
Id. We introduced an experimental feature into the language for explicit deallocation of structures. 
The Dealloc primitive, along with --- (local barrier synchronization) allows Id programmers to 
insert commands that deallocate the storage associated with an object when that object is no longer 
in use. 


Programmer-directed deallocation will be performed to determine the costs and benefits of explicit 
deallocation in terms of program performance and the problem sizes that may be run without 
exhausting memory or invoking the garbage collector. 


The Dealloc primitive explicitly deallocates the storage associated with a structure in Id. In order 
to use the Dealloc primitive, we must have proper synchronization that prevents the Dealloc from 
executing until all uses of the structure to be deallocated have executed. For that reason, we have 
also introduced a barrier synchronization construct, denoted by three or more dashes: ---. 


In Id, unlike other parallel languages, a barrier is a local synchronization. A barrier can only appear 
within a letrec block, and its effects are limited to that letrec block. A barrier in Id ensures 
that the code in the block bindings before the barrier executes to termination before the code in 
the block bindings after the barrier. We will define a control region to be the program region 
containing a group of block bindings delimited by barriers. In Id, a control region terminates when 
all computation threads have exited the control region. In other words, all values in the region 
have been produced and all side effects have been performed. 


The example in Figure 1.1 contains a block with one control region consisting of the bindings of x 
and r. In this example, the object to which x is bound will be deallocated when the computation 
in both bindings in the control region have terminated. 


Invocation and termination of control regions are partially ordered. Invocation is the point in time 
when the interpreter first begins executing a portion of a control region, and termination is the 
point in time at which the interpreter finishes executing all code in a control region. Naturally, 
termination of any control region always occurs after invocation of that control region. 
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def fo() = 
{| x = MakeTuple(6, 847); 


r= {| y =Select4(x); 
z = Selecto(x); 


ri=y+2Z; 


Dealloc(x) bn 
2 


inr } 
Figure 1.2: Statically nested control regions 


Control regions may be composed by enclosing one control region within another or by placing a 
barrier between two control regions. If control region cro statically encloses control region cr,, then 
the invocation of cro must precede the invocation of region cr; and the termination of cro must 
follow the termination of cry. 


Definition 1.1 (Barrier Relation) The relation (cro --- cr1) holds if control region cro is stat- 
ically separated from cr, by a barrier and cro comes before the barrier textually. 


If control region cro is separated statically from control region cr, by a barrier, and cro comes 
before cr,, then both the invocation and termination of ero must precede the invocation of cry. 


Consider the body of procedure fo in Figure 1.2. In this example there are three control regions: 
region cro which is composed of the bindings of x and r, region cr; which is composed of the 
bindings of y and z, and region erg which is composed of the deallocation command. Region cro 
encloses region cr,; therefore, the invocation of cro precedes that of cr,, and termination of er, 
precedes that of cro. Region cro is separated from region cr by a barrier, and so both the invocation 
and termination of cro must also precede the invocation of crg. The control region composition 
relations are transitive; therefore, the invocation and termination of region cr , enclosed by ero, 
must precede the invocation of region crz. 


The ordering of the invocation and termination of dynamically composed control regions follows 
from that of statically composed control regions. If control region cro contains a procedure call, 
and cr, is the control region of the run-time instance of the body of that procedure call, then we 
say that cro dynamically encloses cr;. Therefore, the invocation of erg will precede the invocation 
of er, and the termination of erg will follow the termination of cry. 


The example in Figure 1.3 is similar to Figure 1.2, except that control region cr, is in the body 
of procedure g. In this example, control region cr; is dynamically enclosed within control region 
cro because procedure g is called from within control region cro. Therefore, the partial ordering of 
invocation and termination of control regions will be the same as in the previous example. Clearly, 
we must be able to name dynamic instances of control regions if we are going to be able to talk about 
the ordering of invocation and termination of those regions. This naming of dynamic instances is 
one of the topics we will discuss in more detail later in this thesis. 
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def fo() = 


{| x = MakeTuple(6, 847); 
r= g(x); 
CTO 


inr } 
def g(x) = 
{| y = Select1(x); 
z = Select2(x); 


ri=y+2; 
inr1} 


Figure 1.3: Dynamically nested control regions 
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If one control region statically or dynamically encloses another, then the lifetime of the outer region 
will completely include the lifetime of the inner region. On the other hand, if two control regions 
are separated by a barrier, then the lifetime of the first will completely precede the lifetime of the 


second control region. 
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From these two properties we determine that control regions form a natural tree. 


Definition 1.2 (Ancestor Relation) Control region cro is an ancestor of region cr, or 
cro = cr, F 


if cro statically or dynamically encloses region cry. 


In addition, we say that control region cr, is a descendent of region cro if cro is an ancestor of 
region cr;. Every control region erg will be considered to be an ancestor and a descendent of itself. 


cro = ero [° 
The expression cro 1", where n > 0, refers to the nth ancestor of region cro. 
We will now define two precedence relations on control regions. 


Definition 1.3 (Invocation Precedence) The invocation precedence relation 
(ero X7 cry) is defined as follows: 


dn. erg = cry 7” 

cro Ay ery = V 

q nr n 
dng, ny. cro ["° --- ery [™ 


If (cro X7 cri), then control region crpo must be invoked before cr; may be invoked. 


Definition 1.4 (Termination Precedence) The termination precedence relation (cro <7 cr1) 


is defined as follows: 


dn. erg T° = ery 
cro AT Cry = V 
dng, my. cry [ --- cro [™ 


If (cro Xr cr1), then control region cro must terminate before cr; may terminate. 


1.6 Safety of Explicit Deallocation 


Note that the use of the Dealloc primitive is inherently unsafe. The programmer may try to 
deallocate structures that are shared between various parts of the program, causing all sorts of 
errors to occur. Just as in other languages with explicit allocation and deallocation, Dealloc 
introduces the possibility of dereferencing dangling pointers. Therefore, the programmer must 
analyze his program to verify the safety of each explicit deallocation performed. In this section, 
we will see a set of informal conditions that must be met in order to safely deallocate storage in a 
program. 


Conceptually, there is a single condition that must be satisfied in order to safely deallocate or reuse 
an object. An object may be deallocated when there are no further references to the object. In an 
implicitly managed system, an object will be deallocated when there are no live references to the 
object. A live reference to a data structure is a reference, or pointer, that is stored in either the 
activation frame of a procedure invocation, or in a static variable or in a data structure to which 
there is a live reference. In a system in which the programmer must explicitly manage storage, an 
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object may be deallocated as soon it can be guaranteed that no further use will be made of the 
object, and so the lifetime of objects may be shorter than in a system with garbage collection. 


One way to guarantee that there will be no further references to an object is to put the deallocation 
command in a control region or control region that executes after all uses of the object execute. 
Here is a condition that describes when it is safe to deallocate an object. 


Condition 1.5 (Deallocation Safety) Given two control regions cro and cry and a variable x 
and structure ol bound to x in both regions, it is safe to deallocate the structure ol in control region 


cry if 


1. Ver,. UsedIn (ol, cr,) > An. ero = cr, {” 
CTro777 Cry, 


. the only use of ol in region cr, is in the deallocation of ol, and 


WN td 


. the only deallocation of ol is in region cry. 
where UsedIn (ol, cr) is true if object ol is allocated or dereferenced in control region cr. 


The first two subconditions of Condition 1.5 guarantees that all uses of the structure bound to 
variable x in control region cro have terminated before the Dealloc in control region cr, can 
execute. The third subcondition guarantees that there are no references to the contents of # in er, 
that may execute after x is deallocated, and the fourth subcondition guarantees that the structure 
bound to x is deallocated only once. 


Condition 1.5 is sufficient to ensure the safety of the deallocation of structure x. This condition 
is very conservative; it means that the control region containing the producer and all the control 
regions containing consumers of the structure have terminated before the deallocation statement 
executes. 


Figures 1.4 and 1.5 show procedures from the Gamteb [8] photon transport simulation benchmark. 
The procedure compton, shown in Figure 1.4, contains a Dealloc command that satisfies Condi- 
tion 1.5. The variable new_particle is bound to a newly allocated tuple in the first binding in 
the body of procedure compton; the structure to which new_particle is bound is deallocated in 
the control region after the barrier. Note that procedure transport_particle uses new_particle 
but does not store it anywhere or return it as a value. Procedure compton allocates nine words of 
storage for the new particle. Adding the Dealloc statement allows that storage to be reclaimed as 
soon as compton terminates. 


The procedure handle_collision, shown in Figure 1.5, contains a binding of variable t_particle 
to a structure allocated in the body of procedure photo_elect and used in the body of compton. 
The control regions in which this structure is allocated and used are all descendents of the control 
region in which t_particle is bound. The deallocation of the structure bound to t_particle in 
the control region after the barrier is safe because the allocation and all other uses of this structure 
occurred either in the control region before the barrier or in descendents of that control regions, 
and so all of these uses must have terminated before the deallocation command is invoked. 


In the rest of this thesis, we show how to verify the safety of a deallocation command at run time. 
We show how to check the safety of deallocation commands at compile-time using a conservative 
approximation of when objects may be allocated, dereferenced, and deallocated. We also present 
an algorithm for inserting safe deallocation commands at compile-time. 
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def transport_particle xsect_table bins particle prob = 
{ X,y,Z, U,v,w, wt, e, e_bin, cell, seed = particle; 
pcompton, ppair, pphoto, ptotal = prob; 


dsurf, surface = dist_to_surface x y zu v W; 

rand, randi = grand seed; 

d_coll = dist_to_collision ptotal rand; 

bin_counts = if (d_coll >= d_surf) then 
move_to_surface d_surf 
else % (d_coll < d_surf) 
handle_collision d_coll; 

in 

bin_counts}; 


defsubst compton particle d_coll xsect_table bins = 
{ %% Allocate a new particle, deallocate it in this context. 
new_particle = (new_x,new_y,newz, --- new_seed ); 


r = 
if ekill then 


else 
(transport_particle xsect_table bins 
new_particle new_prob) 
Dealloc new_particle ; 
in r }; 


Figure 1.4: Procedure compton 


1.6. SAFETY OF EXPLICIT DEALLOCATION 


defsubst handle_collision d_coll = 
{ %% t_particle is allocated within photo_elect, 
%% and deallocated in handle_collision. 
t_particle, absorb, wt_kill = 
photo_elect particle d_coll pphoto ptotal ; 
counts = 
if (not wt_kill) and (randi < p_compton) then 
compton t_particle d_coll xsect_table bins 
else 


r = add_counts counts col_counts ; 


Dealloc t_particle ; 
in r } 


Figure 1.5: Procedure handle_collision 
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CHAPTER 1. INTRODUCTION 


Chapter 2 


Problem Statement 


In order for the compiler to verify or insert explicit storage deallocation code in programs, it must 
be able to determine the lifetimes of the objects being deallocated. Thus, the compiler must have 
some notion of the run-time behavior of the program being compiled. In this thesis, the compiler 
will use an abstraction of the operational semantics to determine the lifetimes of objects. 


This thesis develops a method for lifetime analysis that is directly applicable to parallel, single- 
assignment languages. In particular, we will be using the language KID~ as the basis for the 
analysis. 


The first step in developing our lifetime analysis algorithm is to define a standard operational 
semantics for the language of interest. One can define the operational semantics in terms of an 
abstract machine, in terms of a term rewrite system, or in terms of an interpreter. We define the 
operational semantics in terms of an interpreter because that allows us to stay close to the original 
source code of the program, rather than compiling into object code for the abstract machine. 


This chapter describes KID~ syntax and gives its semantics in terms of an interpreter. In the 
first section, we define the notation used throughout this thesis. In the second section, we define 
the syntax of KID~ programs and the value domains over which KID~ programs operate. In the 
third section, we define the standard KID~ interpreter and give examples of its operations. In the 
fourth section, we define the deallocation problem in terms of the KID~ interpreter, and in the 
fifth section we will give an overview of the development of our solution in the rest of this thesis. 


2.1 Notation 


We will adopt the convention of using double brackets, [| e ], around program text. Environments 
will be represented by p, looking up variable x in environment p will be represented by pla], 
and binding variable « to value v in environment p by p[v/2]. Stores will be represented by o, 
dereferencing a location / in store o will be written as o[/], and binding location / to value v in store 
o will be written as o[/ — v]. The expression P(X) indicates the powerset, or set of all subsets, 
of X. Tuples will be written with angle-brackets and elements separated by commas: (v1,---, Un). 
Tagged structures will be written with a subscript tag, (roo V1,°°°,Un), and x. Tag will be used to 
refer to the tag of such a structure. The expression D, constructs a new domain that consists of 
the elements of domain D plus a new element L which is less than all elements of D. 
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F n= fo|fil... User Function Names 
xX n= xo |x1| xo]... Identifiers 

SE n= 1/|2|3|True|False|¥X Simple Expressions 

L n= lol&|i|... Expression Labels 
tag € Tag = {1,2,3---} Oneof Tags 

OP n= + |—|And|Or|... Primitive Operators 


MakeTuple | Select; 

MakeArray,, | Fetch 

MakeOneofya,.n | SelectTag,n | I8?raq 
Cons | Nil | Hd | T1 | Nil? 


BE := SE |"OP(SE,..., SE) Expressions 
LE(SE,...,SE) Function Applications 
if (SE, BE, BE) Conditionals 

Bs n= NX = BE;.--;X = BE Block Bindings 

Ds = Dealloc(X);---;Dealloc(X) Deallocation Commands 

BE n= {Bs---Ds in X}| EF Letrec Blocks 

pre Prog 2= {...F(X,...,X)= BE;...5} Programs 


Table 2.1: KID~ syntax 


We use the adjective concrete to refer to values from the standard and instrumented value domains. 
These are values that arise during actual execution of a program. We use the adjective abstract 
to refer to values that arise during abstract interpretation of a program. These values summarize 
all the possible values that could arise during the execution of a program under the standard or 


instrumented interpreters. Hats on values (7%) or functions ( f) will be used to denote the abstraction 
of some value whenever it is not clear from the context that we are talking about an abstract value. 


The metalanguage in which the interpreter is written has strict semantics. Letrec blocks in the 
metalanguage are written “{ x = e in z }” and have recursive, t.e., letrec, scoping rules. The 
metalanguage can be viewed as a mathematical notation, in which there is no notion of order 
of evaluation. It can also be viewed as an abstract syntax for a functional language. All of the 
definitions written in this thesis could be written in a strict, functional language. 


2.2 Syntax of KID™ 


KID~ is intended to be an intermediate language used when compiling Id programs. For this 
reason, it lacks some features that Id has, such as pattern matching, and so KID™ programs can 
be rather verbose. KID~ does not have loop expressions — in this work, we interpret and analyze 
them by translating them into tail recursive functions. 


KID~ is a sugared form of the lambda calculus. Functions are named, and a program consists of 
a top-level recursive block defining the functions in the program. This allows a concise expression 
of simple programs. The recursive scoping of the program block obviates the need for the Y 
combinator in the language. Expressions are either constants, variables, conditional expressions, 
or applications of functions or primitive functions. The syntax of KID~ is shown in Table 2.1. 


A program consists of a recursively scoped block of function definitions. A program must define 
a function named fo that takes no arguments — this corresponds to the main function in a C 
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program. Interpretation of a program begins by invoking this function. Nested functions are not 
allowed in this language. For a treatment of how to transform a set of nested function definitions 
to a flat set of function definitions, see [25] for a description of lambda lifting. Also, no currying, 
or partial application, of functions is supported. Hochheiser [21] describes the compilation of a 
language with currying into a KID-like intermediate form with no currying. Higher-order functions 
and closures will be discussed in Section 9.1. 


Because KID™ is a first-order language, identifiers are separated into function identifiers F’ and 
value identifiers X. Simple expressions are either constants or identifiers. Expressions can be simple 
expressions, primitive operator applications, function applications, conditional or block expressions. 
Primitive and user function applications are labeled with a static label drawn from domain L, the 
set of static expression labels. This expression label will be used in the interpreter to identify 
objects and procedure activations. 


KID~ expressions are divided into two major categories: simple expressions (SE) and expressions 
(F£). The division simplifies many of the clauses of the interpreter, because simple expressions 
cannot modify or reference the store; they can only reference the environment. All expressions, 
except block and conditional expressions, consist of an operator and a number of simple expression 
parameters. In these expressions, each of the parameters can be evaluated by the simple expression 
evaluator, which does not take or return a store, thus reducing the number of stores that are 
defined. This reduces the clutter in the evaluator definition. Use of SE is even more pronounced 
in the instrumented and abstracted interpreters, where more values are returned by the expression 
evaluator. 


Block expressions consist of a set of recursively scoped bindings, a synchronization barrier, and a 
set of deallocation statements. The interpretation of block expressions is rather involved because 
KID7~ is non-strict and because the scoping of variables in blocks is recursive, ¢.e., block expressions 
are letrec blocks. The result of a block expression is the value of the final identifier x in the block’s 
inner environment and the block’s inner store. The return value may be returned as soon as it is 
available — block expressions are non-strict and the result value is unaffected by the synchronization 
barrier. After all computation in the bindings has terminated, each of the deallocation statements 
is executed — the deallocation statements are hyperstrict in each of the bound variables. 


Anyplace a block expression is expected, a single expression may be used instead. This allows 
expressions such as 
{ X =e; 
in x } 
to be written simply as e whenever x is not a free variable of expression e. 


The predicate of a conditional is a simple expression, but both branches must be block expressions. 
Also, the bodies of function definitions must be block expressions. This formulation of the syntax 
ensures that every structure that is allocated is initially bound to an identifier, because the only 
place that a structure allocation primitive can occur is on the right-hand side of a block binding. 


KID~ has primitives for constructing and manipulating three types of aggregate objects. The 
primitive MakeTuple takes n arguments and constructs an n-tuple from their values. The primitive 
Select; takes a tuple and returns the 7th component. The primitive MakeArrayy takes a length 
parameter n and some additional arguments, and constructs an array of n elements where each 
element of the array is the value of function F applied to the index and the additional argument 
values. Fetch takes an array and an index and returns the corresponding element of the array. 
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neN = {1,2,3--+} Integers 

bE B = True+ False Booleans 

leL = {lo, 1, l2,---} Expression Labels 
ac AL = €¢+ ALL Activation Labels 
ol € OL = AL:L Object Labels 
veVv = (N+BH+ a Denotable Values 
tuple € Tuple = (tuple V,-++, V) Tuples 

Varray € Array = (40 n, Vy: viv) Arrays 

Voneof © Oneof = (n,n V,-° (V) Oneofs 

Vist € List = (Cons V, OL) + (Nil ) Lists 

sve SV = (Tuple+ Array + Oneof + List), Storable Values 

o € Store = OL=SV Stores 


Table 2.2: Standard value domains 


Algebraic types are tagged sums of types. In KID~ we represent algebraic types by oneofs, which 
are tagged sums of tuples. MakeOneof,,,,,,,, takes m arguments and constructs a oneof tagged with 
j and m components that belongs to a type with njags different disjuncts. The tag 7 of a oneof 
must be in the range 0 < 7 < Ntags. Select; ; takes a oneof and returns the 7th component of that 
oneof, if the tag of that oneof was 7. Is;? takes a oneof and returns True if the tag of that oneof 
was 7. 


2.3 KID~ Domains 


KID~ has the usual types of values: integers, booleans, tuples, arrays, algebraic types (oneofs), 
and lists. These value domains are defined in Table 2.2. Figure 2.2 contains the definitions of the 
least-upper-bound operators on the standard value domains. Figure 2.3 contains the definitions of 
the ordering operators on the standard value domains. The domains are all naturally ordered. 


In order to model sharing of objects properly we will use a store that maps unique labels to tuples. 
A label unbound in a store will map to L. Tuples will be passed by reference; We will refer to 
tuples by their object labels. The actual tuple will reside in an associated store. A denotable value 
that is a label of an object makes no sense without an associated store. Denotable values, drawn 
from domain V, are either numbers, booleans, object labels or L (undefined). 


Object Labels 


In order to determine the lifetime of objects, we must be able to distinguish one object from another. 
Therefore, when objects are created they must be assigned a unique label. We will always refer to 
the objects by this label. 


There are many ways a unique label could be allocated for an object. We could use something 
like gensym to create arbitrary new, unique labels. However, in order to implement the non-strict 
interpreter, we must be able to deterministically generate a unique label for each instance of each 
allocation command. We will see later in this chapter that evaluating non-strict, recursive letrec 
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{ def f(x,y) = 
{ t = oMakeTuple(x,y) 
result = g(t); 
Dealloc(t); 
in result }; 


def g(t) = 
Select, (t) 


Figure 2.1: Labeled deallocation example 


blocks involves fixpoint iteration over successive approximations of the recursive environment. Each 
time we improve the approximation of the environment and store of a letrec expression, we must 
get the same object labels for each object allocated in the block. Therefore the structure of labels 
must be tied to the structure of the program in some manner so that when we summarize labels 
we summarize information about particular parts of the program. 


In order to name objects uniquely in our interpreter, we will use both a static label from the 
allocation primitives and a dynamic label identifying the particular invocation of that primitive. 
Therefore, the domain OL of object labels will consist of two components: a unique activation label 
and a static label that denotes the expression in the program that allocated the structure. 


Static labels are assigned to each expression in a KID~ program. We will only display the pertinent 
labels on allocation primitives and function applications. These labels will be placed to the left and 
above the expression that they annotate. In Figure 2.1, we have labeled the MakeTuple primitive 
with Jp by placing [5 to the upper left of the MakeTuple expression. This label forms the static 
portion of the object label of any tuple allocated by executing this particular expression. 


Activation Labels 


We will call the dynamic portion of object labels their activation labels. Activation labels are drawn 
from the domain AL, whose structure will be described below. Our scheme for labeling activations 
is similar to that of Harrison. 


In [19], Harrison uses a pair consisting of a variable name and a procedure string to uniquely 
name variable instances. In his system, every function expression (lambda abstraction) is uniquely 
named statically, e.g., A°°. The language he is modeling has call-cc and this must be reflected 
in procedure strings, which name a sequential execution path through a program. A procedure 
string consists of a sequence of lambda names with a superscript of d or u to indicate the entrance 
or exit from an instance of that procedure, e.g., the procedure string a¢a?zat indicates entering 
the body of lambda ag, entering the body of lambda expression a, followed by exiting the body 
of lambda expression a ,. Harrison shows that these labels uniquely name every instance of every 
object allocated in the program. 


Harrison’s scheme works well in a sequential language in which there is only a single thread of 
control that can be named by the procedure string. However, in a parallel language such as KID~, 
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there is no sequential thread of control that can uniquely identify each object. Therefore, we will 
use a variation of procedure strings based on the hierarchical rather than sequential ordering of 
procedure activations. In this scheme, every expression within a function definition will be uniquely 
labeled. An instantiation of a procedure f called from a procedure g executing in activation a can 
be uniquely named by a followed by k;, where k; is the label of the expression within procedure g 
which calls procedure f. Thus, an activation label is the concatenation of the names of the edges 
in the run-time call-tree, where each edge is labeled by the application expression that created that 
edge. An important feature of these labels is that the label assigned to a particular instantiation of 
a procedure invocation will always be the same regardless of the execution order of subexpressions. 
Since object labels consist of an activation label paired with the expression label of the allocation 
primitive expression, this feature carries over to object labels. 


Our activation labels uniquely identify a particular invocation of a procedure during the execution 
of a program. Activation labels AL denote a path down the call tree of a program. Activation 
labels consist of a string of expression labels: 


6. yo Ryn, 


where ¢ is the empty activation label and each of the k; is the expression label of a user function 
application expression. We use strings of expression labels instead of strings of function names be- 
cause we must be able to distinguish two invocations of a single procedure within a given procedure 
activation. Activation label € is used as the activation label of the main body of the program. Each 
time a procedure is called from activation a, we construct a new activation label by concatenating 


the expression label & of the function application to a with a “.”, yielding a.k. 


For example, consider the definition of procedure fib, shown below. Procedure fib is recursive; it 
contains two calls to itself within the body. 


def fib(i) = 
{pei< 2; 
r= if p then 1 
else { nt = ’fib(i-1); 

n2 = "£ib(i-2); 
n3 =ni + n2 
in n3 } 

in r }; 


If we invoke fib(3) in activation a, then we get the following activation tree: 
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v = v 
v= v 
v = 
Tue = T 
vy, if vy = ve 
bry v2 = T otherwise 
(Tuple V1s** +s Un,) Usv (Tuple W1s* + +s Wng) = 
(Tuple (V1 Uv W1),°*+, (Un, UV Wr,)) ifn = ng 
T otherwise 
(Array M15 V15°°*s Un) Usv (Array 22, W15°°*,Wng) = 
(Array 71,(01 Uv w1),***,(n, Uv Wn,)) ifm, = ng 
T otherwise 
(tag 4m Vly" *,Uny) SV (tage jmg Wi1y7° *, Wn) = 


(tag; ,m, (V1 Uv Wi),+++, (Wn, Uv Wr,)) if tag, = tagg Amy = m2 A ny = ng 
T otherwise 


(Cons V1, V2) Usv (Cons Wy, W2) = (Cons (v4 Uy wr), (v2 Uy w2)) 


(wit ) Usv (wit ) = (it ) 


oy[ol] Usy og[ol] ifol € OL 
L otherwise 


01 Ustore 92. = Aol. 


Figure 2.2: Least upper bound operators on standard value domains 


Note that the activations labeled a.ko, a.ky, a.ky.ko and a.ky.k, can all proceed in parallel with 
the parent. The only information we get from the activation labels is that parent activations 
are initiated before their child activations and that parent activations terminate after their child 
activations terminate. 


2.4 KID Interpreter 


In this section we give an operational semantics for KID™ in terms of a standard interpreter. First, 
we discuss the evaluation strategy used by the interpreter, then we present the overall structure of 
the interpreter, then we present the program evaluator, simple expression evaluator, and expression 
evaluator. Next we discuss the correctness of the interpreter. Finally, we discuss the deallocation 
problem and give an overview of our solution. 
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Lt Cv Wo 
vo £C T Vo 
True ifvy = Ll 
vw Cy ve = True if vy = v9 
False otherwise 
01 Cstore 02 = /\ or[li] Cov oplli] 
EL 
( Tuple V15°° -,Un,) Lsy ( Tuple W15°° Wn) = 
Ai Ev wi) ifn = ng 
a 
False otherwise 
(Array 1, V0,°° -, Un, —1) Lsy (Array n2, Wo," -,Wny-1) = 
A (Ev wi) ifn = ng 
O<i<ny 
False otherwise 
(tag; ,m4 Vio"° *,Un,) Lygy (tago,mo W1,°° *, Wn) = 


ING: Cy wi) iftag, = tagg \ my = m2Any = 72 


4 
False otherwise 


(Cons v1, 01) Lsy (Cons Wy, W2) = 
(7, Cy wi) A (v2 Cy we) 


(wit) Csv (wi ) = True 


Figure 2.3: Ordering operators on standard domains 


2.4.1 


Evaluation Strategy 


This interpreter is somewhat novel in that it evaluates each expression more than once in order to 
implement the non-strictness of the KID™ language. Typically, an interpreter will evaluate each 
expression exactly once. 


Consider the following KID~ fragment, which uses non-strictness to define the second component 
of the tuple in terms of the first component of the tuple. 


fa 


x = 


y 
in 


a 


‘oMakeTuple (x,y); 


Select, (a); 


There is no order in which we can evaluate the three bindings of this expression in order to com- 
pletely specify the expression. The evaluation strategy used by this interpreter is to repeatedly 
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evaluate subexpressions in successively improved environments and stores until a limit is reached 
and the expression is fully evaluated. The interpreter will first approximate the environment of the 
body of the block expression by creating and environment in which each of the bound variables is 
bound to L. Then it will evaluate each of the right-hand-side expressions in that environment to 
yield new approximations to the values of the bound variables and new approximations to the value 
of the store. This process is repeated until both the environment and the store have stabilized. 


For this example, the interpreter would start with environment p° and store 0°: 


aa l 
po = xl 
you 
0 
o = LStore 


After evaluating each of the right-hand-sides in environment pop and store o9 and combining the 
results into a new environment and store, we get p! and a!: 


a-a:ly 
p= x2 
yout 


1 | a: lo = (tpt 1,1) | 


oO 


in which variables a and x have non-bottom bindings and label a : Jp is bound to a tuple containing 
1 andl. 


One more iteration would yield p? and o?: 


a-a:ly 
p= | x2 
yout 
o = | a: lo = (tpt 2,1) | 


Now the first component of the tuple labeled a : /p contains 2. 


Yet one more iteration would yield p? and o°: 


a-a:ly 
p= | x2 
y-2 
o° = | slo = (tupte 2,4) | 


in which variable y is now bound to 2. 
Finally, we would reach the environment p* and store o* of the completely evaluated block expres- 


sion: 


a-a:ly 
p= x2 
y2 


ot at lo = (Tuple 2,2) 
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in which all three variables have non-bottom bindings and the tuple has no bottom components. 


We can tell that environment p* and store o* have reached the fixpoint by iterating the process 
one more time. This iteration yields the same result as the previous iteration; therefore, p‘ and o* 
must be the complete value. 


a-a:ly 
p = |x-2 
y2 
o = a: lo ( Tuple 2,2) 


The important thing to notice is that each expression was evaluated five times in order to reach 
the fixpoint. The evaluation strategy we have chosen had some effect on how we named objects. 
We had to be able to deterministically assign a label to an object in order to evaluate MakeTuple 
expressions multiple times. 


Most interpreters for non-strict languages use a rewrite system, where subexpressions are rewritten 
when they are evaluated. We chose this evaluation strategy because when we abstract the inter- 
preter we want the compiler to analyze programs by recursively descending the program, evaluating 
as it goes along. We want the program being evaluated to have the same text as the program being 
annotated or verified; so we do not want to use a rewriting interpreter. 


Arrays 


Here is an example of the use of MakeArray. 


{ 
def gi (i, x, y) = 
‘oMakeTuple(x,y,i); 


def f1 (n, x, y) = 
"MakeArray,; (n, x, y); 
} 


This example consists of two procedures. Procedure g1, which takes three values, allocates a three- 
tuple containing the three values and returns the value as its result. Procedure f1 uses MakeArray 
to construct an n-element array with v; as the 7th element of the matrix: 


vj, = gi(i,z,y) 
where (0 <2 <n). 


The value v and store o resulting from a call to procedure £1 with values 3, 17 and 22 in activation 
a would be: 


vo = a:h 

atl = (Array 3,0-4.0 + lo, a.d.1 2 lo, a.4.2 : Ip) 
2.14.0 2 lo + (Pupte 22, 23, 0) 

aly.l ilo + (tupte 22, 23, 1) 

a.ly.2 21g + (Pupte 22, 23, 2) 
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Please note that procedure g1 is invoked with the index 7 of the array element as well as the values 
of the two extra parameters passed to MakeArray. Any number of additional parameters may be 
passed to the element creation function through the extra parameters to MakeArray. These extra 
parameters increase the expressiveness of the language without having higher-order functions. 


Algebraic Types 


In some cases, we would like to represent a value whose type is one of a number of different types. 
In this case, we use an algebraic type, which is a disjoint union of the types. We will refer to 
algebraically typed objects as oneofs. In order to maintain type safety, a disjunct tag is maintained 
on these objects, and special operators are provided to construct and manipulate them. 


For example, consider the transaction algebraic type defined below. 


type transaction = deposit I | withdrawal I 


A transaction is either a deposit or a withdrawal. 


Transactions are represented by tagged structures in the standard semantics. A transaction object 
is either a deposit: 
(0,2 7) 


where the subscript 0,2 indicates the Oth disjunct of a type with two disjuncts, or a withdrawal: 


(1,2 m). 


where the subscript 1,2 indicates the 1th disjunct of a type with two disjuncts. Any particular 
transaction value will be either a deposit or a withdrawal. 


The KID~ code to create and manipulate structures of the transaction type using these primitives 


would look like: 


def make_deposit(n) = 
loMakeOneof,2(n) ; 


def make_withdrawal(m) = 
loMakeOneof 19 (m) ; 


def deposit_amount(d) = 
if Iso(d) 
then Selecto,(d); 
else Error(); 


def withdrawal_amount(w) = 
if Is, (w) 
then Select, (w); 
else Error(); 


def deposit?(t) = 
Iso (t) ; 
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where Error is a primitive that always returns bottom (and presumably drops the user into the 


debugger). 


Some algebraic types are defined recursively. These types are used to represent things such as lists, 
trees, and graphs. Because lists are used so often in functional programs, KID~ has primitives 
specifically defined to create and manipulate list objects. 


2.4.2 Interpreter Structure 


This section introduces the structure of the standard interpreter and defines several properties that 
the interpreter satisfies. The interpreter consists of three semantic functions, SE, € and P&, which 
together interpret KID~ programs. The following are the signatures of the semantic functions that 
make up the interpreter. 


SE : SE-Env-V Evaluates simple expressions 
E  : E-Env—Store+AL—(V x Store) Evaluates expressions 
PE : Prog-(V x Store) Evaluates programs 


where Env, the domain of environments, is defined as: 
Env = XV. 


Environments map identifiers, or variables, to values. An identifier that is unbound in an environ- 
ment maps to L. 


The function P& evaluates a program and returns a denotable value and a store as the result. The 
function € takes an expression e, an environment p, a store o, and an activation label a and returns 
the denotable value and new store resulting from evaluating the expression e in p, 0, and a. We 
call a triple consisting of an environment, a store, and an activation label a context — it contains 
the contextual information necessary to interpret an expression. 


Definition 2.1 (Dynamic Context) A dynamic context is a triple consisting of an environment, 
a store, and an activation label. 


In order to show that the interpreter is sound, 7.e., it terminates on well-behaved (non-looping) 
programs, we must show that procedures S€ and € are monotonic. Monotonicity is required to 
show the existence of the fixpoints computed during evaluation of letrec blocks. The monotonicity 
of € depends on a property called extensionality. 


Definition 2.2 (Extensionality) A function f is extensive if, 


Va € Domain (f).« C f(x) 


In other words, the result of f(a) always includes z — function f only adds information to its 
argument. 


We will have to show that € is extensive, that is, € only adds to the bindings of the store that it 
takes as input. The set of locations bound in the store resulting from a call to evaluator € will be 
a superset of the set of locations bound in the input store. 
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Proposition 2.3 (€ is extensive) 


Vp € Env,oo € Store,ae AL, 
4d (v,o1) =ELe ]pooe, 
>obo 


The extensionality of € is used in the proof of the monotonicity of €. 
Proposition 2.4 (€ is monotonic) 


Veo, pi © Env,oo,01 € Store,a € AL, 
J (%,0) =ELe ]poce, 
(1,0,) =€Le pone, 


po EF pi) A (00 E a1) > (v0 E v4) A (0G E04) 


oom 


2.4.8 KID~ Program Evaluator 


The program evaluator PE evaluates the main function by invoking the expression evaluator with 
the text of the body of the function, an empty environment, an empty store, and an empty activation 
label. The definition of the program evaluator is given below: 


PE| pr] = 


L4{-°° Sil @ias- ++ 2ijn;) = ej;--+} = pr; 
in €[ €0 ] Len» L Store € } 


where 
fo() = €0 
is the definition of the main procedure fp in program pr. 


The purpose of the program evaluator is to provide the initial environment to the expression 
evaluator so that it may evaluate the body of the program. Function identifiers are handled 
specially; they are not bound in the environment. A different model of program evaluation would 
yield an environment of functions, and one could invoke any of the procedures in the program 
with arbitrary arguments. We chose the whole program view because it is simple and because it 
is consistent with the approach of many systems where programs are compiled and run as a single 
unit with a single entry point. 


2.4.4 KID~ Simple Expression Evaluator 


The simple expression evaluator takes a simple expression and an environment and returns a deno- 
table value. It is used by the expression evaluator. Numeric and boolean literals are evaluated to 
numeric and boolean constants. Identifiers, or variables, are evaluated by looking them up in the 


environment. 
SE[n]p = n where n is a number 
SET b]p = 8 where 6 is a boolean 
SE. «]p = pix] where z is a variable 


Simple expressions cannot modify the store, so no store is passed into or returned from procedure 


SE. 
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2.4.5 KID~ Expression Evaluator 
The expression evaluator will now be defined as a dispatch function on the structure of the input 


term. Remember that the expression evaluator takes an expression, an environment, a store, and 
an activation label, and returns a value and a new store. 


Evaluation of Simple and Primitive Expressions 


The first three clauses of the interpreter define the semantics of constants and variables: 


El n]paa (SE n]p,o) where n is a number 
El b]pca = (SE b]p,c) where 6 is a boolean 
El «]poa (SE « ]p,o) where z is a variable 


All three of these clauses call the simple expression evaluator, and in these clauses the input store 
is returned unchanged because simple expressions cannot modify the store. 


The next clause shows the evaluation of a simple arithmetic expression. 


E| +(se1, sen) ]poa = {um =SE| ser ]p; 
v2 = SE seg |p 
in (v1 + 02,0) } 


The two operands are evaluated first, then the primitive operator + is applied to those values, 
and the result is returned. These primitive operators do not modify the store; consequently, it is 
returned unchanged. 


Evaluation of Function Applications 


We evaluate an application expression by evaluating its arguments and forming environment p’ 
and activation label a’. Environment p’ is obtained by extending the empty environment with 
bindings from each of the formal parameters to their actual values. We concatenate activation 
label a with the expression label & of the activation expression to form the new activation label a’. 
Then we evaluate the body of the function in environment p’, store a, and activation label a’. The 
non-strictness of functions and data structures is handled by the implementation of Letrec blocks, 
shown later in this section. So if we evaluate the body of a procedure before all of its arguments 
have been evaluated, those arguments will be undefined (L) or partially defined (if they are bound 
to labels of data structures). 


E[ *f(ser,-++,sen) Jpoa = {rv =SET[ ser ]p; 


Un, = SET sen |p; 

p = L Bre [01/@15°°*5 Un/@n)i 
a’ =a.k; 
in€ e ]p’oa' } 


where f(#1,---,%,) = eis a definition in the program 


Note that the body of function f is evaluated in a new environment and a new activation a.k con- 
sisting of the current activation label a concatenated with the expression label & of the application 
expression. 


2.4. KID” INTERPRETER 49 


Evaluation of Conditionals 


Conditionals are evaluated by first interpreting the predicate, and then interpreting one of the 
branches of the conditional depending on the value of the predicate. 


€| if (seo,e1,€2) Jpoa = if SE seo ]p = True 
thenE| er Jpaa 
elseE | €2 Jpaa 


Evaluation of Block Expressions 


Evaluation of KID~ letrec blocks is rather complex because they have recursive scope and because 
KID7 is non-strict. They are evaluated by solving the recursive equations resulting from interpret- 
ing each of the binding right-hand-sides in an environment that has the letrec block variables 
bound to the values of the binding right-hand-sides. This recursive equation is solved by fixpoint 
iteration of function Eval Bindings, starting with an initial approximation of the environment that 
binds each of the 2; to bottom and an initial approximation of the store equal to the incoming 
store. 


After the bindings have been evaluated completely, the deallocation statements are executed. The 
deallocation statements have no effect in the standard interpreter, but they will be modeled more 
precisely in the instrumented interpreter in Chapter 3. 


E| { Bs---Dsinz} ]poa = 


{| 1 = €13---34n =n J = Bs; 
| Dealloc(y);---;Dealloc(y,) J] = Ds; 
Po = plL/x1,+++,1/2n]; 
( p’,a") = EvalBindings (Bs, po, 0, a); 


in (p'[z],0") } 
where 


EvalBindings (J. #1 = €13---32n = €n ],p,0,0) = 


{(v,01) =El a1 |poa; 


(rm.on) =EL en [poe 

p" = p[(r U plai])/t1,°++, (er U plen))/en]; 
a! =U0; 

(p",0") = if p'=pNo'=o 


then ( p',o') 
else EvalBindings (| #1 = €13.--3%n = €n ],p’, 0", 0) 
in ( p",o0") } 


Evaluation of Tuple Primitives 


The next two clauses give the evaluation rules for tuple data structures. The primitive MakeTuple 
takes m values and returns a structure containing those m values. This clause constructs a unique 
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object label of by pairing the current activation label and the expression label of the MakeTuple 
expression, and returns a store that has ol bound to the new tuple in the incoming store. The 
object label is returned as the value of the expression. This clause only adds information to the 
store, thus preserving the extensionality of €. 


E[ ‘MakeTuple(se,,---,s€m)]poa = {vy =SE[ ser Jp; 


Un, =SE| sen |p; 


ad = =a:l; 

Vtuple = ( Tuple Vis *", Um) ; 

a = =o[ol = (Vupte U ofol])]; 
in (ol,o’) } 


Tuple selection is accomplished by evaluating the argument to the Select; primitive, yielding an 
object label, and looking up the value of that object label in the current store. The zth component 
of that tuple is returned as a value, along with the current store. 


E[ Select;(se)]poa = {al =SE[ se |p; 
(Tuple Vig" “5 Un) = alol]; 
in (2,0) } 


Evaluation of Array Primitives 


The following three clauses give the evaluation rules for array data structure operators: MakeArray, 
Fetch, and Bounds. The primitive MakeArray ;/, takes a simple expression that evaluates to length 
n and r simple expressions that evaluate to values to pass to function f;, and makes an array of 
length n where the jth component is fj(j,v1,---,v,-). Note that this clause only adds information 
to the store, thus preserving the extensionality of €. 


E[ *MakeArray ;, (seo, 8€1,-++,8€,) Jpoa = 


{ol =a:k; 
n =SE| seo |p; 
Vy = SE] SE, | P5 
Vp - SE] se, |p; 
(uso) =E Le: 1 tne 0/0. 1/01. +5 Pe /te]) 0 (a.k.0) ; 
( Un—-1),7n-1 ) - E| ej ] (Lens [n _— 1/xo, 01/21, _ +, 0, /X,]) 0 (a.k.(n ~~ 1)) ; 


Varray = (Array nh, UO," *, Un-1) ; 
a’ = o[ol = (array U ofol])] U ( [| «) : 
O<i<n 
in (ol,o’) } 
where f;(2o,%1,°--,%,) = e; is a definition in the program. 


The primitive Fetch takes an array a and an index ?, and returns the ith component of a. 


E[ *Fetch(se,,se2) Jpoa = 
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{ ol =SE| ser ]p; 
i =SE| sez Ip; 
(Array 1, V05***sUn—1) = olol]; 
in (2,0) } 


The primitive Bounds takes an array and returns the length of the array. 


E[ "Bounds (se) Jpoa 
{ ol = SE| se ]p; 
(Array nN, VO,°* *,Un—1) = alol]; 

in (n,o) } 


Evaluation of Algebraic Type Primitives 
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The following three clauses define the behavior of the interpreter on the primitives that allocate 
oneofs, select components from oneofs, and test the tags of oneofs. The primitive MakeOneof jag ni.43 
allocates a oneof whose tag is tag and which belongs to a type with niags tags and whose elements 
are the values of simple expressions se, through se,,. The primitive Is;,j? returns True if the 
tag of the oneof to which simple expression se evaluates is tag. The primitive Selecty,,; returns 
the ith component of the oneof to which se evaluates if the tag of that object is tag; otherwise it 


returns L. 


E] ‘MakeOneof tay niags (se1,°++,5€m) Jpoa 


{vy =SET se, |p; 


SE[ sen, |p; 
al; 


= (isneags 


= o[ol > (Voneos U ofol])]; 
(ol, 0") } 
2? (se) Jpaa 
{ol =SE| se ]p; 


sal naags 00***s 0m) = [ol] 
b = if tag = tag’ 
then True 
else False; 
in (6,0) } 
E[ Selectiay i (se) pao 
{ ol = SE| se |p; 


tag’, ntags Vis°° 5 Um) = a(ol]; 
v= if tag = tag’ ; 
then v; 


else L 


in (v,o) } 
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Evaluation of List Primitives 


The following five clauses give the semantics of the list manipulation primitives. The primitive 
constructor Cons takes an element x and a list v1;,; and constructs a new list with x as its head 
and vjjs¢ as its tail. The primitives Hd and Tl take a list and return the head and tail, respectively, 
of the list. The constructor Nil returns a new empty list. The predicate Nil? returns True if the 
value is Nil and False otherwise. 


{vm =SET sey |p; 
vg =SET| seg Ip; 
ad =a:l; 

Veons = (Cons V1, v2) ; 
a = =alol = (eons U ofol])]; 
in (ol,o’) } 

E| Hd(se) ]poa = {ol=SE| se ]p; 
(Cons V1, V2) = ofol]; 
in (2,0) } 

E| Tl(se)]poa = {ol=SE| se ]p; 

(Cons V1, V2) = ofol]; 

in (v2,0) } 


E[ ‘Cons (se1,se2) Jpoa 


ET 'NilQ Jpoa = fol=a:l; 
o' = o[ol > (Cit ) U ofol])); 
in (ol,o’) } 
E| Nil? (se) ]pcoa = {ol=SE| se ]p; 
b = if ofol|.tag = Nil 
then True 
else False; 


in (b,c) } 


2.4.6 Soundness of Standard Interpreter 


Theorem 2.3 The interpreter € is extensive with respect to the store. 


Vp € Env,oo € Store,ae AL, 
dv € Vio, € Store 
(v,01) =ELe]poa Sola 


Proof: 


By structural induction: 


— The clauses that interpret simple expressions and arithmetic primitives return the store 
unchanged, and so these clauses are extensive with respect to the store. 


— The clauses that interpret conditional and function application expressions call the in- 
terpreter on their subexpressions with their input store. Assuming that interpretation 
of subexpressions (the inductive case) is extensive, then the interpretation of conditional 
and function application expressions is extensive with respect to the store. 
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— To prove the extensionality of the clause that interprets letrec blocks, we have to show 
that HvalBindings is extensive with respect to the store. This function computes the 
solution to the set of recursive equations formed by the block bindings. The solution 
consists of an environment p’ and a store o’. The store o’ must include the input store 
a, because HvalBindings calls the interpreter on the binding right-hand sides with the 
input store o, and then takes the least upper bound of the resulting stores. Both of 
these steps are extensive. 


— Each of the clauses that allocate structures are extensive because they only add a binding 
to the store. 


— Each of the clauses that fetch values from structures are extensive because they return 
the input store unchanged. 


Theorem 2.4 The interpreter functions SE and E are monotonic. Given simple expression se, 
expression e, and activation label a, we show monotonicity with respect to the environment and 


store: 

Veo, pi € Env,oo, 01 € Store, 

po Epi > SET se J po ESE] se J pr 

poE pinoo Lo. => Ele ]pooa CEle ]pinea 
Proof: 


First S€, by structural induction: 


— The clauses that evaluate numeric and boolean literals always return the values of those 
literals; therefore, SE[ c ]po C SE| ¢ ]pi, because the values of those literals are 
independent of the environment. 


— The clause that evaluates identifiers looks up the identifier in the environment. If p, is 
more defined than po, then the value of x in p; must be at least as well defined as the 
value of x in po. Therefore, SE « J po CSE « J pr. 


Now €, by structural induction: 


— The clauses that evaluate constants and literals are all monotonic because the values 
they return are from calls to the simple evaluator, which is monotonic, and the stores 
they return are the incoming stores. 

— The clauses that evaluate arithmetic and relational operators are all monotonic because 


these operators are monotonic (e.g., (L +2) C (34+ 2)) and the values passed to these 
operators are obtained from the simple expression evaluator, which is monotonic. 


— In the clause that evaluates function applications, the argument values are obtained from 
the simple expression evaluator, so they increase monotonically as the environment gets 
more defined. We use our induction hypothesis to show that evaluation of the function 
body is monotonic, because recursive calls to € are assumed to be monotonic. 


— If we assume that the semantic conditional returns L if the predicate is undefined, then 
as the predicate gets more defined the result of the conditional gets more defined. If the 
predicate is either True or False, then the behavior is monotonic because we assumed 
that the subsequent calls to € are monotonic. 
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{ def f(x,y) = 


{ t = oMakeTuple(x,y) 
result = g(t); 


Dealloc(t); 
in result }; 


def g(t) = 


Select, (t) 


Figure 2.4: Simple deallocation example 


— To show that evaluation of letrec blocks is monotonic, we must show that function 


EvalBindings is monotonic. Because € is extensive, each of the new stores created in 
EvalBindings is at least as defined as the incoming store, so EvalBindings is monotonic 
in the return store. EHvalBindings is monotonic in the return environment because the 
new environment is created by binding each variable x; to the least upper bound of the 
new approximation of its value and its binding in the previous environment. Evaluation 
of letrec blocks is monotonic because KvalBindings is monotonic and because we do 
not remove the binding of a label from the store when it is deallocated. 


Evaluation of each of the allocation primitives is monotonic because they are extensive 
with respect to the stores, and because they use the simple expression evaluator to 
evaluate their arguments. 


Evaluation of each of the selection primitives is monotonic because they return the value 
of a structure from the input store. If the store becomes more defined then the value of 
a structure in the store must stay at least as well defined as it was before. 


2.5 The Deallocation Problem 


We are trying to solve two related problems. One problem is: given a program with deallocation 
statements in it, verify the correctness of those deallocation statements. The second problem is to 
insert deallocation statements into a program automatically. 


In either case, we must know when a deallocation command is correct. In the program in Figure 2.4, 
procedure f contains a statement deallocating the object bound to variable t. This statement is 
correct only if the structure to which t is bound was allocated within the body of f, the structure 
does not escape from the result of £, and there is no other statement deallocating that structure. 


Thus, we are interested in four important bits of information for any program or procedure in a 


program: 


e the identities of the objects to which variables are bound; 


e the identities of the objects that procedures allocate; 
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e the identities of the objects that procedures return; and 


e the identities of the objects that procedures deallocate. 


The first bit of information is used to determine the other three and to associate lifetime information 
with the program once it has been analyzed. The second and third bits let us determine which 
objects are reachable, and potentially live, outside of the current procedure activation. These two 
pieces of information are also used to determine which objects have lifetimes completely bounded 
by the lifetime of a procedure’s activation frame. Given more precise information about the order 
of execution of a procedure body, its arguments, and its child procedure calls, we could perform 
better dependence analysis that would tell us which objects that are live in this procedure activation 
frame are needed after termination of this activation frame. The last bit of information is necessary 
in order to prevent errors that can occur if the heap manager is requested to deallocate the same 
object more than once. 


2.6 Overview of Our Solution 


The goal of this thesis is to develop an analysis that yields the necessary information to verify or 
insert storage reclamation code. In the next three chapters, we develop a solution to the problem 
of determining object lifetimes at compile-time. Chapter 3 describes an interpreter for KID7~ 
that allows us to determine the unique identities of objects at run-time and to determine exactly 
when these objects are allocated and when they are no longer reachable. Chapter 4 describes 
an abstraction of this semantics that allows us to compute a generalization of the lifetimes of 
objects over all executions of a program. Chapter 5 gives algorithms for verifying and inserting 
deallocation statements using information from lifetime analysis. Chapters 6, 7, 8, and 9 extend 
the value domains and the standard, instrumented, and abstract interpreters to handle arrays, 
lists, and I-structures. Chapter 10 will describe the compile and run-time performance of programs 
automatically annotated by the compiler and Chapter 11 presents the conclusions we have reached 
during this work. 
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Chapter 3 


Instrumented Semantics 


Before we can formalize the conditions that must be satisfied statically by a correct storage deal- 
location command, we must know the conditions that must be satisfied dynamically for that deal- 
location command to be correct. The standard KID™~ interpreter treats the deallocation primitive 
as a no-op, and so it is not sufficient for our purposes. 


In order to determine that a deallocation command is correct, we must be able to determine that 
no reference is made to an object after it is deallocated. In a sequential interpreter, we could mark 
the location when it was deallocated, and any further reference to that location would produce an 
error. In our interpreter, however, we cannot mark an object as deallocated because of the way we 
evaluate letrec blocks — stores are repeatedly passed through all subexpressions of the block. 


In this chapter, the standard semantics will be augmented to collect information about which 
objects were allocated, dereferenced or deallocated by each expression. These collections of events 
can be examined after a program has been interpreted to see if any object was dereferenced after 
it was deallocated. 


The activation labels defined in Chapter 2 give a partial order on the time of execution of the 
instances of each subexpression in a program. In this chapter, we will assign new activation labels 
for the body of each letrec block as well as for each procedure application, so that we can measure 
finer differences in execution times. Activation labels as defined earlier are sufficient to distinguish 
each object that is created by a program, but they are not sufficient to distinguish in which control 
region a particular deallocation takes place. 


In the first section of this chapter, we will see how the information we would like to gather affects 
the structure of the instrumented interpreter. In the next section, we will present the instrumented 
interpreter. Following that, we will discuss the correctness of the interpreter with respect to the 
standard interpreter given in Chapter 2. Finally, we will work through the interpretation of a few 
examples. 


3.1 Instrumented Interpreter Characteristics 


The four pieces of information needed to verify the correctness of the deallocation of the structure 
bound to a variable during the execution of a program will help us define the domains of the 
instrumented interpreter and the signatures of its functions. 
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3.1.1 Collecting the Necessary Information 


First, we must be able to identify individual objects in a program in order to determine when two 
variables are bound to the same object. We use the object labels defined in Chapter 2 to name 
objects uniquely. Although the activation label component of the object label has structure that 
allows us to determine the relative timing of object allocation and deallocation, we only consider 
equality of labels, not ordering on the structure of labels, when we manipulate sets of object labels. 
For instance, when we take the union of two sets of labels, we return a set containing all of the 
different labels. We do use the structure of individual labels as our notion of time of execution, 


though. 


Second, we must collect the labels of objects allocated during the evaluation of an expression. In the 
standard interpreter for KID~, each expression evaluates to a single complete value: a denotable 
value and a store. In our instrumented interpreter, each expression evaluates to a denotable value, 
a store, and three sets of events. These event collections name the objects that were allocated, 
deallocated, and referenced, and the activations in which the event occurred. 


Finally, we can examine the values to which expressions evaluate in order to see what locations 
may be reachable from the result of an expression. The result of interpretation of an expression is 
a denotable value and a store. We can traverse the value with respect to the store to determine 
the set of reachable locations. This information will be used to formulate a conservative safety 
condition for deallocation statements. We will be able to test this condition in the context of the 
deallocation statement, rather than during a postmortem after execution occurs (as is required 
when examining the allocation, deallocation, and reference events). 


3.1.2. Temporal Ordering of Execution 


Once we have collected sets of allocation, deallocation, and dereferencing events, the next step is to 
give a partial order on the execution of these events. Activation labels have structure that allows 
us to use the hierarchical termination of activations as a measure of execution time. We use that 
to order events in time. 


In the instrumented interpreter, every distinct activation label names a different control region. 
We extend the invocation and termination precedence relations from control regions to activation 
labels. Thus, we say that an activation labeled agp terminates before an activation labeled a, 
if the termination of the control region labeled a9 must precede the termination of the control 
region labeled a ,. In other words, ag is a prefix of a,, that is, the control region, or activation, 
named by apo is an ancestor of the activation named by a,. In the KID~ interpreter, termination 
proceeds hierarchically — parent activations cannot terminate until all of their child activations 
have terminated. 


Every letrec block in KID~ has a group of bindings, a barrier, and a group of deallocation 
commands. If the block expression has label & and is executing in activation a, then a.k will be 
the label of the control region containing the group of bindings, and a.k7 will be the label of the 
group of deallocation commands. These two activation label satisfy the relation: (a.k --- a.k7). 


Definition 3.1 (Activation Label Termination Order) The relation ag <7 a, means that 
activation ag must terminate before activation a, and is defined as follows: 


Qo XT 1 = (A0 = 04.) 


where 2 is a string of zero or more expression labels. 
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neN = {---,—-1,0,1,---} Integers 

bE B = True-+ False Booleans 

leL = {lo li,l,---} Expression labels 
ac AL = e+ ALL Activation Labels 
ole OL = AL:L Object labels 
veV = (N+B+O0OL), Denotable values 
t € Tuple = (Tuple V,--+,V) Tuples 

Varray € Array = (Array N,V,---,V) Arrays 

Voneop © Oneof = (win V,--:,V) Oneofs 

Vist € List = (Cons V, OL) + (wil ) Lists 

sv ESV = (Tuple + Array + Oneof + List); Storable Values 
a € Store = OL SV Stores 

pe Env = XV Environments 


Figure 3.1: Instrumented semantic domains 


In other words, activation a, must terminate before activation ag terminates if ag is an ancestor 
of a,. If activation ap is preceded by a 1, then we will say that ap is an ancestor of a1, or that ao 
is higher in the call tree than ay. 


We will use this notion of termination order to catch dangling pointer errors, and to give correctness 
conditions on programs to guarantee that no such errors will occur at run-time. 


3.2. An Instrumented Interpreter 


Now that we have formulated some of the criteria that the instrumented interpreter must satisfy, 
let us develop the interpreter and its value domains in more detail. In this section we define an 
instrumented interpreter for KID~ based on the ideas presented earlier in this chapter and in the 
previous chapter. 


3.2.1 Semantic Domains 


As in the standard semantics, the instrumented semantics operates over integers, booleans, tuples, 
arrays, and lists. We will use the domains from Chapter 2, which are shown again for reference in 
Figure 3.1. The domain ordering and least upper bound were shown in Figure 2.2. 

In addition to the values and stores computed by the standard interpreter, the instrumented inter- 
preter three sets of events. An object event pairs the object label of an object that was allocated, 
deallocated, or dereferenced, and the activation label in which the allocation, deallocation, or 
dereferencing occurred. 


The domains of allocation events (AFVs), deallocation events (DEVs), and dereferencing events 
(REVs) are defined as follows: 

AEVs = P(OLx AL) 

DEVs = P(OLx AL) 

REVs = P(OLx AL) 
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Each type of event consists of an object label paired with the activation label denoting when that 
event occurred. We will refer to allocation, deallocation, and dereferencing events collectively as 
object events. The interpreter will collect sets of events, rather than sequences, because not all 
events can be ordered. 


3.2.2 Semantic Functions 


This section presents the definition of the instrumented interpreter, which augments the standard 
interpreter with mechanisms to collect object events. 


The following are the semantic functions that make up the instrumented interpreter: 


€;: E-Env— Store+AL—(V x Store x AEVs x DEVs x REVs) 
PE, : Prog-(V x Store x AEVs x DEVs x REVs) 


The three extra values returned by the interpreter: A+ € AE Vs, A~ € DEVs, and A? € REVs, 
tell us exactly which objects were allocated, deallocated, and dereferenced in each instance of each 
expression. 


The function €; takes an expression, an environment, a store and an activation label, and returns the 
resulting value, the resulting store and the sets of allocation, deallocation, and dereferencing events 
yielded by the interpretation of that expression. The function P&; takes a complete program and 
returns the result value and store and the set of allocation, deallocation, and dereferencing events 
from the execution of the program. 


Note that €7, like €, is extensive with respect to stores and also monotonic. These properties are 
necessary in order to prove that the instrumented interpreter terminates with a unique result. 


Program Evaluator Definition 


The definition of the program evaluator is almost exactly like that of the standard program evalua- 
tor, except that it returns three sets of object events. Here is the definition of P€;, which interprets 
programs. 


Perl {++ filtia,.+ +, Vin) = ei +++} = 
{ ( vio, At, A~,A®) = all €0 ] Len Lstore€3 
in (v,0, A+, A7~, AP) } 


where expression €g is the body of the main procedure fp. 


Simple Expression Evaluator Definition 


The instrumented interpreter uses the simple expression evaluator from the standard interpreter. 
This is shown for reference in Figure 3.2. 
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SE[n]p = n where n is a number 
SET b]p = 8 where 6 is a boolean 
SE] «]p = plz] where z is a variable 


Figure 3.2: Simple expression evaluator 


Er[_n ]poa = (SE[ nr ]p,2,9,9,9) 
Er| 6 |paa = (SET 6 ]p,¢,9,0,9) 
Er| x ]poa = (SE[ x ]p,7,9,9,0) 
Er| +(se1, se2) Jpoa = (SE sei Jp + SET sez Jp, 0, 9,0,0) 


Figure 3.3: Evaluation of simple expressions and primitive operators 


Expression Evaluator Definition 


In this section we discuss the definition of the instrumented expression evaluator. 


Simple expressions and primitive arithmetic and boolean operators are evaluated in a manner 
similar to that of the standard interpreter. The result is a quintuple consisting of the value, 
the incoming store, and three empty sets because simple expressions cannot update the store, or 
allocate, deallocate or dereference locations. These four clauses of the interpreter are shown in 
Figure 3.3. 


The clauses for evaluation of function applications and conditionals are shown in Figure 3.4. These 
clauses are the same as the corresponding clauses from the standard interpreter, except that evalu- 
ation of the body of the function and the taken branch of a conditional yield sets of object events. 


Evaluation of letrec blocks in the instrumented interpreter is similar to evaluation of letrec 
blocks in the standard interpreter, except that this interpreter must collect the sets of labels of 
objects allocated and deallocated by each binding right-hand-side. In addition, the body of the 
letrec block in the instrumented semantics will be evaluated in a new activation, whose label is the 
letrec block’s expression label concatenated to the current activation label. This new activation 
label gives us a more precise notion of when objects are allocated, deallocated, and dereferenced. 
This information will be used to determine if any dangling pointer errors occur. The labels of 
objects deallocated by the deallocation statements of a letrec block are returned with the set of 
labels of objects deallocated during execution of the bindings. The interpreter clause for letrec 
expressions is shown in Figure 3.5. 


Figure 3.6 gives the evaluation rules for tuple data structures. These are similar to the corresponding 
clauses of the standard interpreter, except that they return object events. MakeTuple returns the 
same value and store in the instrumented interpreter as in the standard interpreter, but it also 
returns three sets of object events. The dereferencing and deallocation event sets are both empty, 
but the allocation event set consists of a single element: the object label paired with the current 
activation label. The primitive Select; returns the 7th component of the tuple, the incoming store, 
empty sets of allocation and deallocation events, and a dereferencing event set consisting of a single 
element: the object label of the argument paired with the current activation label. 
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Er| *Flser,-+,8€n) Jpoa = 


{ M1 = SET ser Jp; 
Un =SE| se, |p; 
a’ =a.k; 


(v,0', At, A7,A®) = €F[ e€ J (plvi/a1,-+ +5 0n/an))oo ; 
in (v,0/, At, A7~,A®) } 
where f(#1,---,%,) = eis a definition in the program 
Er if (seo, e€1,€2) Jpoa = 
if SE seo | p 
then E;[ €1 | poo 
else E; | €2 | paa 


Figure 3.4: Evaluation of conditional expressions 


Figure 3.7 contains the clauses of €; for the array primitives. These clauses are the same as the 
clauses from the standard interpreter except that they return sets of allocation, deallocation, and 
dereferencing events in addition to a value and store. The primitive MakeArrayy, collects the 
allocation, deallocation, and dereferencing events from each of the calls to f; and augments the 
set At of allocation events to include the allocation of object ol in activation a. The Fetch and 
Bounds primitives record that the label o! of the array passed to them was referenced in the current 
activation a in A”, the set of reference events that they return. 


The evaluator clauses for algebraic types, given in Figure 3.8, and the evaluator clauses for list 
primitives, given in Figure 3.9, are similar to the corresponding clauses from the standard inter- 
preter, except that the constructors return non-empty allocation event sets, and the selectors and 
predicates return non-empty dereferencing event sets. 


3.2.3. Correctness of the Interpreter 


We will consider the instrumented interpreter to be correct if the denotable value and the store 
returned from the execution of a program under the instrumented interpreter are always equal 
to the denotable value and store returned by the execution of the program under the standard 
interpreter. 


Theorem 3.2 The instrumented interpreter is correct with respect to the standard interpreter. 


V pr € Prog, 

(5,05) = PE| pr J, 
(v4,0:, A+, A7,A®) = PET pr J, 
(vs = U4) A (os = O¢) 


Proof: 


Informally: The instrumented interpreter computes the same values and stores as the stan- 
dard interpreter, because all portions of the interpreter that compute values and stores are 
the same as the standard interpreter. Hl 
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€;[ *{ Bs—Ds in x} ]poa = 
{ | 1 = €13---3€n = en ] = Bs; 
[ Dealloc(y);---;Dealloc(y,) ] = Ds; 


Po = plL/x1,-++,£/2n]; 

al =a.k; 

(plo, AT AM, AR’) = KvalBindings; (Bs, po, 7, a’); 
An =A~U{( pllyih a!) [<i < BY: 


in ( p'[a],o’, At’, AW", AP’) } 
where 
EvalBindings; (| @1 = €13---32n = €n ],p,0,0) = 


{ (04,01,4%,A1,4"1) = rf er J poe: 


( Uns On, At y, Anp A®, ) = E7[ en | poa; 
Pp = pl(rr U pla) /21,+++, (nr U plen))/2n); 


' =o; 
AY =U Ate 
AW =U A-y: 
AM SUA: 


( p" a At! AW” AP’) _ 
if p' =pNo'=o 
then ( p',o", At AW’ AP’) 
else EvalBindings; (|. @1 = €13---32n = €n ], 9’, 0", 0) 
in ( p",o0" At" AW” AR") } 


Figure 3.5: Evaluation of block expressions 


3.2.4 Soundness of the Instrumented Interpreter 


Theorem 3.3 The instrumented interpreter Ey is extensive with respect to stores. 


Ve € EF, VainAL, Vp € Env, Yoo € Store, 
A(v,01, A+, A7, A”) = Ere Jpooa, 


a Loy 


Proof: 
Similar to the proof of the extensionality of the standard interpreter. Hl 
Theorem 3.4 Interpreter function Ey is monotonic with respect to the context: 


Vee FE, Va € AL, Vpo, pi € Env, Yoo, o1 € Store, 
pol pinoop hoy => Erle | poooe C E7[ € J pica 
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Er[ ‘MakeTuple (s€1,°++,S€m) J poa = 
{vm =SET se Jp; 


Vn, =SE| sen |p; 


ad = =a:l; 

Vtuple = (Tuple Vis "> Um) ; 
/ . 

o = lol 7 Vtuple]} 


in ( ol,o’, {( ol, a )},0,0) } 
E;|| Select; (se) | poa = 
{ ol =SE| se |p; 


( Tuple Vis *", Um) = olol]; 


in (v;,0,0,0,{(ol, a )}) } 


Figure 3.6: Evaluation of tuple primitives 


Proof: 


Similar to the proof of the monotonicity of the standard interpreter. I 


3.3. Interpretation of Some Examples 


In this section we will evaluate a couple of examples under the instrumented interpreter to illustrate 
its behavior. 


3.3.1 Interpretation of a Non-Recursive Example 


We will start with a non-recursive example: 


{ def f(w) = 
met = Mg w); 
a = Select,(t); 
b = Selecta(t); 
r= (a * b); 
in r }; 
def g(x) = 
kof y = (x-21); 
t = '0MakeTuple(x,y) ; 
in t } 
def fp © = 
1 £ (68) ; 
3; 


If we execute the program using the instrumented interpreter, we get the following call tree: 


3.3. INTERPRETATION OF SOME EXAMPLES 65 


Er] "MakeArray, (s€9, 5€1,°°+,S€,) | poa = 
{ol =a:k; 
n =SE| se0 |p; 
vy =SE| ser |p; 


vr =SE| se, |p; 


ago =a.k.(0); 
( uo, Fo, Ato, A790, Ao) = 
Er[ €; | 1 Env[O/20, v1 /24, _ -,v,/t,|oaQ ; 


Qy-1 = a.k.(n — 1); 
( Un 1,97 1,ATn pA» 1 A®,, 1) = 
Er| €; J Lenv[(n 7 1)/ao,01/%1, _ Up [Lp ]FAn—1 ; 


a! = lol 7 (Array n, Uo, +, Un—1)] U Lo; ) 
At = {(ol, a )}U(U; AT); 
AU = U; Noi; 
AP =U; A"; 
in (ol,o’, At, A~,A®) } 
where f;(ao,%1,---+,%,) = e; is a definition in the program 


€;[ *Fetch(se1,se2) ]poa = 
{ ol =SE]| sey |p; 
i =SE[ sez Ip; 
(Array 1, V9,***,Un—1) = [ol]; 


in (0;,0,0,0,{(, of )}) } 


€;[ “Bounds (se) Jpoa = 
{ol =SE| se ]p; 
(Array R,V0,** +, Un—1) = olol]; 


in (n,o,0,0,{(a, ol )}) } 


Figure 3.7: Instrumented evaluation of array primitives 


t = €.ky.ko.ko : lp 
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Er[ ‘MakeOneof tay,nrags (s€1,°++,5€m) J poa = 
{v7 =SET sey |p; 


Un; =SE| sen, ] p; 
l; 


ol =a: 

Voneof = (sassmags 1; *5Um)3 
oa! = =o| [ol _- (Donat U o[ol])]; 
in (ol, 0’, {( ol, a )},0,0) } 


Er| Istag? (se) Jpoa = 
{ol =SE| se ]p; 
tag’ ,ntags VO,° °° Um) = a(ol]; 
b = if tag = tag’ 
then True 
else False; 


n (b,0,9, {( ol, a )},0) } 


E7; | Selectiag; (se) |poa = 
{ ol = SE| se |p; 


sag! ttag, Pis***sm) = fol; 
v= if tag = tag’ ; 

then v; 

else L 


in (v,0,0,{(ol, @)},0) } 


Figure 3.8: Instrumented evaluation of oneof primitives 


Each node in the call tree is labeled with the expression that invoked the procedure corresponding 
to that node and with the activation label of that node. We also show the binding of variable t in 
procedure f to the tuple allocated within g, labeled €.41.k2.ko.k3 : lo. 


The result under the instrumented semantics is: 


( 3196, 
L store[€.K1-k2-ko.k3 : lo = ( Tuple 68, 47)], 
{( €.ky.ko.ko.k3 : lo, €.ky.ko.ko.k3 )}, 
0, 
{( €.ky.ko.ko.k3 : lo, €.ky.k9 )} ) 


The first component of the result indicates that the answer was the number 3196. The second 
component of the result, the store, indicates the store at the end of the evaluation of the example. 
The third component indicates that a single location, €.k,.k2.k9.k3 : 19, was allocated, the fourth 
component indicates that no locations were deallocated, and the final component indicates that 
location €.k,.k2.ko9.k3 : lp was dereferenced during execution in activation ¢€.ky.k2. 


In this program, the lifetime of the tuple that g allocates is from the time that g allocated the tuple 
until procedure f terminates, because that is the last time that there is a pointer to the tuple. We 
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{v, =SET se, |p; 
vg =SET| seg |p; 


Vist = (Cons V1; v2) ; 


E7;[ ‘Cons (se,, se) J poa 


odo =a:l; 

o = =o[ol > vist]; 

in (ol,a', {( ol, a )},0,0) } 
Ey Hd(se) Jpoa = {ol =SET| se ]p; 


(Cons V1, 02) = ofol]; 

in (v1,0,0,9,{(ol, a)}) } 
Ey. Tl(se) Jpoa = {oa =SET| se ]p; 

(Cons V1, 02) = ofol]; 


in (v2,0,0,0,{(ol, 0 )}) } 


Er 'Nil() ]poa = fol =a:l 

Vist = (Nil ) 5 

oa’ =olol = v1s¢]; 

in (ol,o’, {( ol, a )},0,0) } 

E;[ Nil?(se) Jpoa = {ol =SET| se ]p; 

b = if ofol|.tag = Nil 
then True 
else False; 


in (b,0,0,0,{(ol, a )}) } 


Figure 3.9: Instrumented evaluation of list primitives 


can also say that the lifetime of the tuple labeled ¢.ky.k2.ko.k3 : lo is bounded by the lifetime of 
activation ¢.k1.k9. 


68 CHAPTER 3. INSTRUMENTED SEMANTICS 


3.3.2 Interpretation of a Recursive Example 


Now let us evaluate a recursive example. 


{ def foo(t) = 
kof a = Select,(t); 
b = Selecta(t); 
p = (a == 5); 
r= if p then a 
else @{ t’? = ‘oMakeTuple(5,7); 
v = fo0(t’); 


in v } 
in r } 
def fy © = 
kf tO = MakeTuple(3,4) 
result = "!f00(t0) 


in result } 


} 


Evaluation of this program under the instrumented interpreter yields the following call tree: 


The result under the instrumented semantics is: 
( 5, 
L Store [e.k4.ky.ko.k3 : lo - (Tuple 4,5), €.k4 : hy > ( Tuple 3,4)], 
{( €.k4.k 1 .ko.k3 : lo, €.k4.k1.ko.k3 ) 5 ( €.k4 : i, €.k4 yt, 
0, 
{( €.k4 : i, €.k4.ko.ko ) 5 ( €.k4.k1.ko.k3 : lo, €.k4.ky.ko.k3.kg.ke )} ) 


which shows that the result was the number 5 and that two tuples, labeled ¢€.k4 : 1, and €.k4.ky.k.k3 : 
lg, were allocated. Neither of these tuples is reachable from the result, and no tuples were deallo- 
cated. The two labels were dereferenced; the object labeled €.44 : 1, was dereferenced in activation 
€.k4.k,.kg and the object labeled ¢€.k4.k1.k2.k3 : Io was dereferenced in activation €.k4.k1.k2.k3.k9. 


3.4 Object Deallocation Safety Condition 


A number of definitions are needed before we can give a safety condition for object deallocations. 
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Given a denotable value v and a store 0, we must be able to determine the labels of the objects 
reachable from value v in store 7. The following definition defines which objects are reachable from 
a given dynamic value and store. 


Definition 3.5 (Object Reachability) Reachable(v,o), the set of labels of objects reachable 
from value v with respect to store a, is defined as follows: 


Reachable(L,o) = 9 
Reachable(n,o) = 9 
Reachable(b,o) = 0 
Reachable (ol,a) = {ol} USVReachable (o[ol], c) 


SVReachable(L,c) = 9 
SVReachable (( tuple V1,°* +5 Un) +7) UJ Reachable (v;, 0) 


SVReachable (( array Ms V15°°*, Un) >) 


UJ Reachable (v;, 0) 


SVReachable ((tag,n V1s***»Um) +7) 


UJ Reachable (v;, 0) 


SVReachable ((Cons V1,02),7) = Reachable(v,,0)U Reachable (v2, 0) 
SVReachable ((ni),o) = 9 


We also need to know what objects are reachable from the context surrounding an expression. We 
will call these objects the inherited objects. These are the objects that an expression can use that 
were allocated outside of the expression. 


Definition 3.6 (Inherited Objects) The function Inherited (e,p,o) returns the set of labels of 
objects reachable from F'V (e) given environment p and store o: 


Inherited (e,p,0) = UJ Reachable (p[w], c) 
we FV (e) 


Remember that if variable w is unbound in environment p, then p/w] is bottom. 


Previously, we defined a dangling reference, or dangling pointer, to be a pointer that was deref- 
erenced after it was deallocated. A pointer will also be considered dangling if the activation in 
which the object is deallocated may terminate before the activation in which the object is allocated 
(because an allocation is another form of dereferencing a pointer). To be more precise, the activa- 
tion in which an object is allocated or dereferenced must always terminate before the activation in 
which the object is deallocated. 


Definition 3.7 (Dangling Reference) For a program pr, let 


(v,0, At, A~,A®) PET pr | 
R = Reachable (v,c) 


Then DP | pr ], the set of dangling pointers after the execution of program pr, is defined by: 


DP [ pr |= 
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(al, a) € Ag 

A(ole R 

V(( ol, a, )€ AP A 7(a, Xr a_)) 
V(olae) € At Alas <7 @))) 


(ol, a_ ) 


The set of dangling pointer events is the set of all pairs of object labels of and activation labels 
a_— such that: (1) a reference to ol is returned as part of the result of the program, (2) there 
is a reference to ol in some activation a, that does not terminate before activation a_, or (3) 
the activation in which ol was allocated does not terminate before activation a_. Either of these 
conditions counts as a dangling pointer error. 


The deallocation of an object of in program pr at activation label a_ is considered correct if the 
pair ( ol, a_ ) does not show up in the set of dangling pointer events resulting from the execution 
of program pr. 


Condition 3.8 (Deallocation Correctness) The deallocation of object ol upon termination of 
activation a_ is correct if the following condition holds: 


(ol, a_) ¢ PP [ pr | 


where pr is the program. 


Condition 3.8 is exact, in that any deallocation command that does not lead to dangling pointer 
errors will be considered correct. However, we have to execute the whole program before we can 
determine if any deallocation command is correct. 


The reason we cannot verify Condition 3.8 as we evaluate each letrec block is that an object 
may be deallocated in some letrec block, and returned as part of the result of that block. This 
deallocation is correct as long as no attempt is ever made to dereference the object once it has 
been deallocated. To be more precise, the letrec block corresponds to one control region, or 
activation, and we can deallocate the structure in this control region as long as the structure is 
never dereferenced in a control region that is an ancestor of this one. 


When we verify that deallocation commands are correct, we are willing to be a little less precise 
and to only accept deallocation commands that deallocate objects in the highest control region 
from which the objects are reachable. This property we call safety — if a deallocation command is 
safe, then it is guaranteed to be correct, although some correct deallocation commands are unsafe. 


There are two reasons to use the deallocation safety condition rather than the deallocation correct- 
ness condition when we test deallocation commands. One is that safety is a local property, and so 
this allows us to verify the safety of a deallocation command in a procedure without considering 
all of the places in which the procedure might be called. This point is especially important if 
the algorithm is to be generalized to an environment including separate compilation. The second 
reason is that the simplest version of our abstract interpreter summarizes all activation labels by 
the empty activation label, and so we cannot tell the relative ordering of subexpressions. 


An object deallocation is safe, or guaranteed to be correct, if the deallocation occurs in the highest 
dynamic context from which the object is reachable. 


Condition 3.9 (Object Deallocation Safety) It is safe to deallocate object ol in context ( p_, 7_, a_ ) 
where 


(v,0', At, A7~, A®) = €;l] e ]p_o_a_ 
R = Reachable (v, 0’) 
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if the following condition holds: 


(ol,a_) € AW 
A ol@R 
A Vol, a, ) € A®. (a, <r a_) 
A Vol, ay) € At. (ay Xr a_) 


and if there is only one deallocation of ol, which is in activation a_. 


This condition is correct whenever an object is deallocated in the highest control region from which 
it is reachable. For instance, it is always safe to deallocate an object that is not part of the result 
of a program upon termination of the main procedure fo, although this may not be of much use. 


Unlike Condition 3.8, Condition 3.9 can be checked at the time the deallocate is performed by 
examining the current program state: the environments of enclosing contexts and the objects 
reachable from those contexts and the current block expression. For this reason, this condition will 
be used in Chapters 4 and 5 to develop a static analysis for verifying and inserting deallocation 
commands. 


Theorem 3.10 (Deallocation Safety Theorem) [f an object deallocation satisfies Condition 3.9 
(Deallocation Safety), then it satisfies Condition 3.8 (Deallocation Correctness). 


Proof: 


Sketch of proof: If an object ol is deallocated in activation a_, the highest activation from 
which ol is reachable, then the allocation of ol and all dereferencing of of must take place in 
activations labeled a, such that each a, terminates before a_. Hi 


In the next two chapters, we abstract the instrumented interpreter and restate the safety condition 
in terms of the abstract interpreter. The next two chapters restrict storable values to include only 
tuples so that we can concentrate on the process of abstraction and how to state and test the 
deallocation safety condition. Later in the thesis we add the abstraction of other types of objects 
to our abstract interpreter. 
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Chapter 4 


Abstracted Semantics 


The purpose of abstract interpretation is to capture information about the execution of an ex- 
pression or program over all possible data. We summarize, or abstract the values produced by a 
program over all executions of a program. For instance, if a program evaluates to a number un- 
der the standard or instrumented interpreters, our abstract interpreter summarizes its result as N, 
meaning any number. Similarly, our abstract interpreter evaluates both branches of all conditionals 
in order to summarize the behavior of the conditionals. In this way, the behavior over all control 
paths and over all data can be approximated. 


The abstract semantics that we use captures information about the shape and identities of objects 
that are allocated and the dynamic reachability of these objects from the variables and structures 
to which they are bound. In the Chapter 5, we use this information about reachability and ob- 
ject identities to develop algorithms to verify the safety of deallocation commands and to insert 
deallocation commands in KID~ programs. 


In the rest of this chapter, we develop an abstracted interpreter that summarizes the behavior of 
programs in such a way that we can determine object lifetimes. In the first section, we briefly 
describe how the abstract interpreter is used. In the second section, we define the abstract value 
domains for this abstract interpreter. In the third section, we describe the evaluation strategy 
used by our abstract interpreter. In the fourth section, we define the abstract interpreter itself. In 
the final section, we show some examples of using this interpreter to determine that lifetimes of 
particular objects are bounded by the lifetimes of given procedure invocations. 


4.1 Using the Abstract Interpreter 


Our abstract interpreter does not directly yield lifetime information. It computes the shape of 
the objects that a program may allocate and how they may be interconnected rather than the 
actual values that may fill those objects. However, a lifetime analyzer uses the connectivity, or 
reachability, information to determine the approximate lifetimes of objects. This intuition lead to 
the development of our abstract interpreter and is also the reason why our abstract interpreter is 
general enough to be used for other analyses. 


To perform lifetime analysis on a procedure, we need to know all the possible values to which each 
variable may be bound and all possible values each object may contain. The questions we ask to 
determine object lifetimes are, “When is the first possible time that this object is reachable from 
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the running program?” and, “When is the last possible time that this object is reachable from the 
running program?” Thus, we must be able to know all possible places from which we can reference 
an object. In the remainder of this section we discuss the precision of this reachability information 
and how to use that information to determine that deallocation commands are safe. 


4.1.1 Precision of Information 


The reachability information generated by the abstract interpreter is approximate. However, the 
imprecision is asymmetrical — a negative result is definite, while a positive result is indefinite. 
The most precise fact we can determine is that an object is not reachable from a given variable. 
If the abstract interpreter determines that an abstract object is not reachable from the result of a 
procedure invocation, then under no circumstances will that object be reachable during execution 
of the procedure invocation under the standard interpretater. We must be very careful to base all 
of our decisions on precise negative information rather than approximate positive information. For 
example, if we determine that variable may be bound to some set of abstract objects labeled /s, 
then « may be bound to L, or to one of the locations in /s, but x will definitely not be bound toa 
location outside of Js. 


Given this insight into the kinds of questions we may ask about object reachability, let us reexamine 
the three conditions that an object must satisfy in order to be safely deallocated within a dynamic 
context. First, the object must have been allocated within the context. In other words, the object 
must not have been inherited, or passed in from a surrounding context. We verify this by testing 
that the object cannot be reached from a surrounding context. Since we are talking about the 
binding of a variable, we must actually test that none of the objects to which the variable could 
be bound can be reached from a surrounding context. Next, the object must not escape from this 
context. We verify this by testing that none of the object labels to which this variable could be 
bound are reachable from the result of the context of interest. Finally, this object must not be 
deallocated more than once. We test this by verifying that none of the object labels to which this 
identifier may be bound are in the set of object labels that may be deallocated by other deallocation 
commands. 


4.1.2 The Abstract Deallocation Safety Condition 


The canonical form of a block expression is shown below: 


{ Xo = eo; 


Xn-1 = ©n-1;3 


Dealloc(yo); 


Dealloc(ym_1); 
in x } 


We use x; for the names of the bound variables, y; for the variables in deallocation commands, 
and w,; as the free variables of a letrec expression. Here, each variable x; is bound in the block 
expression. The object to which each of the y;’s is bound will be deallocated once the bindings 
have completely evaluated, and the value of x is returned as the result of the block expression. 
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The context in which an expression is evaluated provides the environment, store, and activation 
label in which the expression is executed. Let us consider a block expression e evaluated in the 
standard context c= ( p, a, a). If there are deallocation commands for y1,---,¥y, in the top level 
of the block expression, then we can verify that these deallocation commands are correct under 
the standard interpreter as follows. First, we determine what locations are passed into e from the 
context. Call this set J. 


T= UJ Reachable (p[w], c) (4.1) 
we FV (e) 
We can use either o or o’ here because the language is functional. If side effects were added we 
would have to use a’, although we would still use environment p. 


We evaluate the bindings of the block expression, yielding the environment and store for the eval- 
uation of the body of the expression — call these p’ and o’. The resulting value of this evaluation 
is ( p'[z], o’ ), where variable x is the result of the block expression. 


Now, we must also determine the set R, which is the set of objects reachable from the result of the 
evaluation of e in context c. 


R = Reachable (p'[z], 0’) (4.2) 
where x is the result of the block expression above. 
Given this exact information from the standard evaluator — p, o, I, R, p’, and o’ — we can 
determine that it is safe to execute each of the deallocation commands in e in a particular dynamic 
context. 
ply] € I 
Safe? (| Dealloc(y;) ]) = Ap [yi] € R (4.3) 


N Nyty Olu # PLY] 
In other words, each deallocation is guaranteed to be correct if the value of y; — the object being 
deallocated — is not inherited from the context (it must have been allocated within e), is not 
returned as part of the result of e, and is not deallocated by any other deallocation command. 


In order to verify deallocation safety in the abstract interpreter, we must perform a similar test. 
So, starting with an abstract context (p, 7, @), we must evaluate the bindings of e to obtain the 
environment and store (p’ and o’) of the body of the block expression, and then compute sets [ 


and R: 


T= UJ Reachable (p[w])o (4.4) 
we FV (e) 
R = Reachable (p'[x])o" (4.5) 


Given all of these abstract values, we can conservatively determine safety using the following pro- 
cedure: 
_ Ply] =0 
Safe? (|| Dealloc(y;) ]) = Ap ly JNk=9 (4.6) 
Mya, Pld 0 A'lyil = 9 
Note that instead of testing for object ol not being in sets J and R, we now must test that none of 


the labels in the value of y; are in sets J or R. Also, we must test for pairwise disjointness of the 
values to be deallocated rather than testing for pairwise inequality of object labels. 


In Chapter 5, we develop algorithms for verifying and inserting object deallocation commands. In 
that chapter we see how all of the necessary values are computed using the abstract interpreter. 
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4.2 Abstracting the Semantic Domains 


The abstract interpreter is supposed to allow us to compute or approximate the value of a useful 
property of a program. We are interested in knowing which objects can be reached from each of 
the variables in the program. 


4.2.1 Abstract Domains 


Figure 4.1 contains the definition of the domains used by the abstract interpreter. We describe 
these domains in more detail in the remainder of this section. 


Activation Labels 


We summarize all activation labels in the standard domain of activation labels by the empty 
activation label «. This abstraction of activation labels is the most extreme way of ensuring a finite 
domain. In Chapter 9 we investigate more precise abstractions of this domain. 


The domain F is the set of static labels attached to expressions in a program. This domain is finite; 
its size is determined by the number of MakeTuple expressions appearing in the program. 


Object Labels 


Abstract object labels are composed of an abstract activation label and a static expression label. 
Since both the AL and £ domains are finite, the domain of object labels OL must also be finite. 


Under abstract interpretation, a variable may have a set of objects to which it may be bound 
because execution of an expression in different contexts may bind variables to different object. 
Thus, object references must be sets of object labels. 


Denotable Values 


The domains N and B of integers and booleans have been compressed to a single element each 
because we are uninterested in the actual values computed — only in the shape and connectedness 
of the values computed. 


Values are either scalars, e.g., integers or booleans, or references to aggregates, e.g., tuples. An 
aggregate value consists of a reference to the tuple and a store containing the value of the aggregate. 
A reference consists of a set of object labels /s. The domain V of denotable values therefore consists 
of the sum of abstract integers, booleans and sets of object labels, all lifted over a bottom element 
.. Note that no objects are reachable from L. 


Stores 


Stores map individual labels to tuples. Location of being unbound in a store o is the same as 
having of bound to L in o. In the abstract semantics, we use sets of labels [s as references to an 
object. We dereference such a set of labels as follows: 


( Tuple C1,°° *,€n) = | | g[ol] 


ol€ls 
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neN = N Numbers 

be B = B Booleans 

aé AL = € Abstract Activation Labels 
leL = {lo, ly, lo,---} Expression Labels 

ole OL = AL:L Object Labels 

Is € Ls = P(OL) Object Label Sets 

vEeV = (N+N+4+ULs), Denotable Values 

Vtuple € Tuple = (tupic V,---,V) Tuples 

a € Store = OL— Tuple Stores 


Figure 4.1: Abstract value domains 


Abs(L) = L 
Absar(a) = € 
Absot(a:l) = e:l 
Abszs (Ils) = |) {Absoz (ol)} 
ol€ls 
L ify= 1 
N if v is a number 
Absy(v) = B if v is a boolean 
{Absor(v)} if vis a location 
T otherwise 


Abs Tuple (( Tuple VLy°* +5 Un, )) = (Tuple Absy (v1), +++, Absy (Vn, )) 


AbSStore (7) | | L Store| Absor (ol) + Abstupic (o[ol])] 
ol€OL 


Additional abstraction operators we require are defined below: 


Absary (At) = 9 
Abspry (A7) = { Absor (ol) IV ( al, a ) € Av} 
Abspry (A") = 9 


Figure 4.2: Definition of the abstraction functions 


We are determining the tuple to which store o maps the object labels in Js by taking the least 
upper bound of the tuples to which o maps each label in Js. 


Please remember that denotable values that are object references, or labels, are meaningless without 
an associated store. Although the labels themselves are very important in this semantics, the 
true meaning of a denotable value is tied to the object in the store named by the value’s set of 
labels. Similarly, the set of labels of objects allocated and deallocated only has any meaning when 
accompanied by a store in which the allocated and deallocated objects reside. 


Abstraction Functions 
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vUuUL = v 
Luv = v 
is, Uns [sg = [sy U [sg 


N if vy, v2 are both numbers 
B if vy, v2 are both booleans 
V1 Uy vg = : 
vy Uvg if vy, v2 are both Ls 
T otherwise 
(Tuple Vist *,; Un, ) U Tuple ( Tuple Wis 0 ', Wn) = 
(Tuple (v1 Uy w1); _ (Un Uy Wn)) if ny = Ne 
T otherwise 


oi [1] UTupie T2[0l] if ol € OL 
L otherwise 


01 Ustore 72 = Aol. 
Figure 4.3: Least upper bound operators on value domains 


Figure 4.2 contains the definitions of the abstraction functions that map values in the standard 
domains to values in the abstract domains. We need these functions in order to show the correctness 
of the abstract interpreter. 


4.2.2 Least Upper Bound Operators 


Figure 4.3 contains the definitions of the least-upper-bound operators on the abstract domains. 
The domains are all naturally ordered. 
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4.2.3. Reachability 


The abstract interpretation of a program yields a model of what objects are created and what 
objects are reachable from the bindings of a letrec block. Because the abstract interpreter sum- 
marizes information about all executions of an expression, we must represent references to objects 
as sets of abstract object labels. To restate the invariant on abstract reachability, we say that if 
variable x is bound to a set of locations /s then, in any given execution, x can be bound to no other 
locations. This reachability invariant is a constraint on the structure of the abstract object label 
domain. The abstraction function Abso,z that maps object labels to abstract object labels must 
enable us to preserve this constraint. 


We need a precise notion of reachable objects in the abstract domains. Given a denotable value 


and a store, we must be able to determine which objects are reachable from that value and store. 


Definition 4.1 (Abstract Object Reachability) Reachable (v,o), the set of labels reachable 
from value v in store o, is defined as follows: 


Reachable(L,o) = 


Reachable (N,o) = 
Reachable (B,c) 


Qa 2 BS 


Reachable (Is,o) = IsU ( UJ SVReachable co) 
ol€ls 


SVReachable (1,0) 0 
SVReachable ((Tuple Vist**sUn),o) = UJ Reachable (vj, 0) 


4 


4.2.4 Ordering Operators on Domains 


Figure 4.4 contains the definitions of the ordering operators for each of the abstract domains. 
All of the domains are naturally ordered. These operators are necessary to show correctness and 
termination of the abstract interpreter. 


The domain orderings on labels are by name. We consider the set {lo} to be less than {lo, i} 
regardless of what those locations may be bound to in a given store. 


We say store a1 is less than store a2 if, for all labels of in the universe of object labels OL, the tuple 
to which olf is bound in oy is less than the tuple to which ol is bound in a2. Tuples are compared 
element-by-element using the value ordering described above. Again, sets of labels are ordered by 
name, not by the values to which they may refer. 


4.3. Abstracting the Interpreter 


In the abstract interpreter, we cannot evaluate procedure calls by unfolding the body of the called 
procedure because this would never terminate if any of the procedures were recursive. Instead, the 
abstract interpreter constructs an input-output mapping for each procedure in a program. This 
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Lt Cv Vo 


is, Cre [sy = is, C [sy 


True ife, = 1 


True if vy, v2 are both numbers 
vw Cy ve = True if vy, vg are both booleans 
vy C vg if vy, vg are both in Ls 
False otherwise 
O71 E Store 02 = \\ oy (f; E Tuple 02; 
l,€0L 
where 
( Tuple Vig" +, Un,) E Tuple ( Tuple W15°° Wn) = 

Aj(vi Ev wi) ifny = ng 
False otherwise 


Figure 4.4: Ordering operators on domains 


mapping describes the behavior of a procedure over each possible set of inputs. We stress that the 
function mapping only approximately describes the behavior of the function. 


When we abstract the interpreter, we make a major change to the clause that evaluates procedure 
applications so that it looks up the result of a procedure application in the input-output mapping 
corresponding to the procedure being applied. The job of the program evaluator, the procedure that 
interprets programs, is to compute the input-output mappings for each procedure. The program 
evaluator iterates a function that improves the approximation of the input-output mappings of each 
procedure until this iteration reaches fixpoint. We describe this process in the remainder of this 
section. 


4.3.1 Computation of Input-Output Mappings 


A KID~ program can be viewed as a set of recursive function definitions. These definitions may 
be viewed as a set of equations defining the values of the functions, where the value of a function 
f, is a mapping from values in the domain of f; to values in the range of f;. A typical system of 
function definition equations is shown below. 


fil®@1,°++,%n) = ree fi(eesjere 


fin(®1,°++5%n) = ree fi(eesjere 


If the system of equations is monotonic with respect to the values of the functions, and the heights 
of all chains in the domains of the functions are bounded, then we can solve this system of equations 
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by using fixpoint iteration. We start with an initial approximation to the solution and generate 
successively improved approximations until we reach an approximation equal to its improved ap- 
proximation — this is the exact solution to the system of equations. 


We start the fixpoint iteration by using an initial approximation of each function that returns 
bottom for all input values. 


PP (@1,+++, en) a 


fo(a1,+++,2n) a 


We can use bottom as the initial approximation to function f; because it is a safe approximation 
to the behavior of unfolding each function application zero times. In general, the value of f# is a 
safe approximation of the behavior of each function unfolded & times, even though it might not be 
a safe approximation of unfolding each function & + 1 times. The value of f?°, however, is a safe 
approximation to the behavior of function f; over any depth of unfolding. 


At the k + 1th step in the fixpoint iteration, we substitute the Ath approximation to function f;, 
F for each use of f; in each equation. The substitution yields the & + 1th approximation to the 
functions, as shown below: 


PP (ayaa) = ce fie 


PE (a1. y an) _ vee fP(ee Je. 


Fixpoint iteration terminates when frm = fF for all functions f; and all possible input values. It 
is guaranteed to terminate when the domains and ranges of the functions are all finite and all the 
functions are monotonic. 


We can view this process as finding the solution to the following equation: 


(fis: +s fn) = Y(F) 


where F is the function that takes an approximation of each of the functions f; and returns a 
refined approximation to each of the functions, and Y is the least fixpoint operator. 


4.3.2 Finiteness of the KID” Abstract Domains 


Although the domain of tuples is not finite, the domains of any particular function must be finite 
because the functions are strongly typed (monomorphically typed). The set of labels F is finite 
because programs are finite, and the depth of nesting of tuples passed as an argument to a function 
depends on the type of that function. The same is true of the result of each function — the depth of 
nesting of the tuples returned and the size of the sets of labels returned are both finite. Therefore, 
the fixpoint iteration described above must terminate. The solution to the recursive set of equations 
exists because the fixpoint iteration described above must terminate. 
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Representing Values of Functions 


In the KID~ abstract interpreter, function values are represented by mappings from products of 
denotable values and a store to pairs consisting of a denotable value and a store. The signature of 
these mappings is 


Fen = (V* x Store)—(V x Store) 


These mappings can be thought of as a table, or set of tuples, consisting of input values and the 
corresponding output values. 


Let us consider an example function, f, whose type and function mapping type are given below. 
If f has type (tTupte N. (tuple N,N)) -N, then the Fen mapping associated with procedure f will 
have type (V x Store)—(V x Store). 


f : ( Tuple N, (tuple N,N)) —N 
Mapping 1 (V x Store)-(V x Store) 


Assume that there are only two locations, lo, of type (Tuple N, (Tuple N, N)) and ly, of type (tTupie N,N): 


L = yh Ture N( Tupte NN)) | j(Tupte NW) 


Given this knowledge, we can enumerate all values in the domain of f: 


lo => L ly => L 
0 lo - ( Tuple a, 1) hy - ( Tuple a, 1) 
{Io} X LStore lo 7 ( Tuple 1, {l;}) x ly 7 ( Tuple N, 1) 
lo - ( Tuple N, 1) ly - ( Tuple 1,N) 
lo - ( Tuple N, {li}) ly - ( Tuple N, N) 


The ‘x’ signs should be read as the cross product of the possibilities for the three portions of the 
input domain: the value, the binding of label /p in the store, and the binding of label /; in the store. 
For example, the least defined element in the domain of f’s mapping is: 


( 0, L Store ) 


and the most defined element is: 
( {lo}, L Store [lo = (Tuple N, {li}) ly - (Tuple N, N)| ) 


Here is the range of possible results returned by function f: 


Io => L ly => L 
L lo - ( Tuple a, 1) hy - (Tuple 4; 1} 
N x L Store lo = ( Tuple 4, I }) x hy - (Tuple N, 1) 
~~ lo - ( Tuple N, 1) hy - (Tuple 1,N) 
lo > ( Tuple N, {hi}) hy > (Tuple N,N 


Note that because f is an extensive function with respect to the store, the result store must contain 
the input store. For example, if f was applied to the store: 


1 store(lo = (Tuple N L) ih - (Tuple N,N)| 


_ 
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{ def foo(t) = 
{ a = Select,(t); 


b = Selecto(t); 
p = (a == 5); 
r= if p then b 
else { t’? = ‘oMakeTuple(5,7); 
v = fo0(t’); 
in v } 
in r } 
def fo © = 
{ t0 = "MakeTuple(3,4) 
result = "!f00(t0) 


in result } 


Figure 4.5: A recursive example 


then the only possible result stores are: 


L Store [lo = (Tuple N L) ih - (Tuple N,N)| 


_ 


and 
L Store [lo - ( Tuple N, {hi}) wy > ( Tuple N, N)| 


because all other stores in the range of f are less than or incomparable to the input store. 


Computation of Function Mapping for an Example 


Now let us go through the steps of constructing a mapping for the recursive example described in 
the previous section. Figure 4.5 contains a program consisting of a recursive procedure foo and a 
call to foo from the main expression of the program. First, let us examine the domain of foo and 
the type of the mapping we will construct for foo. Then we will work informally through the steps 
of building the mapping. Because this program diverges if we try to unfold the procedure calls each 
time the interpreter encounters a procedure application, we have to compute the fixpoint of the 
function that takes the initial input-output mapping of the function (the empty function mapping) 
and produces the final function mapping. 


Figure 4.6 contains the domains of foo. Let us walk through the computation of the function 
mapping for foo, step by step. We start by computing the value of foo on its least defined input, 
and iterate until we reach fixpoint. 


We only compute the value of foo on an input value if that value arises during the abstract 
interpretation of the program. This set of values is what we consider the interesting portion of the 
domain of foo. During each iteration we compute a new approximation of the mapping of foo, 
and we keep track of all values to which foo has been applied. 


In order to compute the function mapping for foo, we start with an initial approximation that maps 
all inputs to bottom, then we use our current approximation to compute successively improved 
approximations until our approximation does not change. 
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foo : (Nx N)-N 


fooMapping : (V x Store)-(V x Store) 
La (i % 
0 lb -— L i - 
{lo} lo - (Tuple a, 1) hy - ( Tuple a, 1) 
{1,} x lo > (Tuple 1,N) x hy > ( Tuple N, 1} 
{lo, 1} lo > (Tuple N, 1} hy > ( Tuple 4; N) 
lo > (Tuple N, N) hy > ( Tuple N, N) 


Figure 4.6: Domain of a function foo 


To compute a better approximation, we evaluate the body of procedure foo with each set of inputs 
that appears in the current approximation of the mapping to compute new output values. We 
record the new output values in the improved approximation of foo’s mapping. As we evaluate the 
body of foo, if we encounter applications of foo to input values that do not occur in the interesting 
domain, we add these values to the set of values in the interesting domain. 


The initial approximation for foo returns L for all input values. We can then initiate the abstract 
interpretation of foo by evaluating the body of the main procedure fo. We encounter a call with 
arguments: 


({h}, [ha = (tupte N,N] ) 


The result from this application is approximated by L, and we add this value to the interesting 
domain of foo. 


To compute our next approximation to the mapping for foo, we evaluate its body on the single 
value in the interesting domain. We get the following mapping: 


({45, [> (tupte N4N)]) = 
(N, [lo - ( Tuple N, N) ly - ( Tuple N, N)| ) 


and we add the input value 


( {lo}, 1 store(lo = ( Tuple N,N) ly - ( Tuple N,N)| ) 


to foo’s interesting domain, because a call to foo with these input values was encountered during 
the computation of the previous approximation. 


And after one more iteration we reach the following approximation for foo: 


( {lo}, [lo - (Tuple N,N) ly - (Tuple N,N)| ) - 
(N, [lo > (tTupte N,N] ) 

(145, [h = (tupte XN)]) > 
( N, [lo - ( Tuple N, N) hy - (Tuple N, N)| ) 


One more iteration yields the same value. Since we are not adding more entries to the interesting 
portion of the domain, and the values of each of the mapping entries have not changed, we have 
reached the fixpoint for foo projected onto the domain consisting of those inputs with the bindings 
shown in the mapping above. 
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Function Environments 


The abstract interpreter constructs a function environment for a program. Function environments, 
members of domain fF Env, map function names to function values. Function names are drawn 
from domain /’, and function values are input-output mappings. 


®eFEnv = F-((V" x Store) — (V x Store)) 


Because KID~ allows recursive function definitions, the interpreter must solve the set of recursive 
equations denoted by the program text that defines the function environment of the program. 


The way we keep track of the interesting domains of each function in the program is with a domain 
map, or DMAP. Each PD in DM AP is a mapping from function names to the interesting portions 
of the domains of those functions. We also collect the change, or delta (A), in the interesting 
portion of the domain of a function. 


D,AP € DMAP = F-P(V* x Store) 


The expression evaluator returns a domain map delta as one of its results. 


4.3.3. Abstract Interpreter Definition 


This section describes an algorithm for abstract interpretation of programs. The algorithm makes 
use of the fact that we are interested in only a few of the elements from the domains of the abstract 
functions defined in a program. This interpreter computes the function environment of a program 
sparsely. That is, the interpreter only computes the elements of the mapping corresponding to the 
input values in which we are interested and to any other inputs that are needed to compute the 
function environment for those interesting inputs. 


The function S€4 takes a simple expression, and an environment, and returns the value of the 
expression in that environment. The function €4 takes an expression, an environment, a store, 
and a function environment, and returns the resulting value and store. The function P€,4 takes a 
complete program and returns the value and store resulting from the execution of the program. 


Note that the set of labels of objects allocated and deallocated during the execution of an expression 
or program is necessarily inexact. Under the abstract interpreter, these sets contain the abstraction 
of the object labels that may be allocated or deallocated under the standard interpreter. The most 
definite thing we can say is which labels were not allocated or deallocated — we cannot say that 
a given location was definitely allocated or deallocated. In the abstract interpreter, we do not 
compute the set of labels of objects that may be allocated or referenced within an expression — 
this is not needed to verify or insert deallocation commands. We do need to know which locations 
may be deallocated by an expression, however. 


The following are the signatures of the semantic functions: 


SE, :SESEnv-V 
€4: E-Env—Store+FEnv—(V x Store x DEVs x DMAP) 
PE 4: Prog-(V x Store x DEVs x FEnv x DMAP) 


where 
pe Env = X—-—V Environments 


A- €DEVs = P(OLX AL) Deallocation Events 
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Environments, members of domain nv, map variables, X, to denotable values. The empty envi- 
ronment, Lgn,, maps all variables to L. Bindings are added to an environment when we evaluate 
the body of a function or a letrec block. Domain maps and domain map deltas map function 
names to sets of values in the interesting domains of those functions. Abstract deallocation events 
track the labels of the objects that were deallocated during the interpretation of an expression. 


Program Evaluator Definition 


Evaluation of a program, which is performed by P€,4, defined below, consists of computing the 
function environment ® for the program, and then evaluating the main expression of the program 
in that function environment. The following is the definition of the program interpreter. 


PEal pr] = {( ®o, Po ) = InitialF Env (pr); 
(®,D) = ComputelFenv (pr, ®o, Po); 
( v,0,A7, A”) = © foll(o)]; 
in (v,0,A7,®, A”) } 


The abstract interpreter first constructs a function environment (®) that, for each function f in the 
program, maps particular input values of f to the result of applying f to those inputs. Whenever 
we encounter an application of a procedure f we fetch its input-output mapping from the incoming 
function environment. Then we determine the output value corresponding to the set of input values 
provided (including the store and activation label) and use that value as the result of the activation. 
We also make sure that the entry for function f and this set of inputs is non-bottom by adding 
these input values to the domain map for f. 


Once the abstract program interpreter has computed the function environment for the program, 
it evaluates the body of the main procedure fo of the program, and returns the result of this 
evaluation, along with the function environment, as the result of abstract interpretation of the 
program. 


Computing the Function Environment of a Program 


The function InitialF Env takes a program and returns a function environment and a domain map. 
The initial function environment takes each function and returns bottom. The initial domain map 
takes a function name and returns the set containing bottom. 


InitialF Env (pr) = 
{ Do = LFEnv; 
VAp,EP: 
Dolfil = {LV Ae F 
in ( Bo, Do ) } 


The function ComputeFEnv, shown in Figure 4.7, iteratively improves the approximations of the 
function environment and the domain map until no further information is added. It does this 
by computing a new entry in the function map of each function fo to f; for each value in the 
interesting domain of the function. It also gathers new approximations to the interesting domain 
of the function. The process of computing new approximations is monotonic; so this is guaranteed 
to reach a stable value. It returns the most precise approximation as its result. 
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ComputeF’Env (pr, ®,D) = 
{[ {20+ fil@ias s+ +s Cin; ) = ej} ] = pr, 


(5, Di. ) = LU {(v,0/,A7,A?) = 
(v1 +09 0) EP [fo E4 [ €0 | LEnv [v1 /a1, sey Ung | Lng |o® ; 


eatin [o~[ 5-5] 


in (6, A? )} 
(%,,, Py, ) = L| { (v,0',A~,A?) = 
(v1 ,-vny 7) DLE] Eal ek [Len [ei/21,---5 Un, /tn,|o® 3 
_ (V15°° 5 Ung, 7) . 
®=1 FE [I 7 | -_ (v, a’, A> ) ) 
in (@, A? )} 


o! =| |®,; 


Dp! =| |Py;; 


(O",D") = if ®@ SO\D'=D 

then ( ®', D’ ) 

else ComputeF Env (pr, ®', D’); 
in ( oO", Dp" ) } 


Figure 4.7: Procedure to compute function environment 


Simple Expression Evaluator Definition 


The simple expression evaluator is defined in Figure 4.8. Because the integer and boolean domains 
have been summarized by single values, NV representing any number and B representing any boolean, 
evaluation of constants returns less information than in the instrumented and standard interpreters. 
However, evaluation of variables is the same — the value of the variable is found in the current 
environment. 


Expression Evaluator Definition 


This section develops the definition of the abstracted expression evaluator. We can think of the 
expression evaluator as providing the rules for simplifying the right-hand-sides of the equations 
that define the function environment of a program. 
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SEx,] n]Jp = N ~ where n is a number 
SEs b]p = B 
SEal x |p 


where 6 is a boolean 


pla] where z is a variable 


Figure 4.8: Abstracted simple expression evaluator 


Esl n]po@® = (SEaln]p,¢,9, Luar) 
Esl b]po® = (SE4| 6] p,0,9, Luar) 
Esl z]pob = (SEa[ 2 ]p,7,9,Lpmar) 
E4[ +(se1,se2) Jpo® = (N,0,0,LpMarp) 


Figure 4.9: Evaluation of simple expressions and primitive operators 


The first four clauses of the interpreter define the semantics of numbers, booleans, variables and 
arithmetic primitives. These three clauses all invoke the simple expression evaluator. These clauses 
are shown in Figure 4.9. The first three clauses describe how the evaluator handles simple expres- 
sions — it calls the abstract simple expression evaluator to produce the result values. The fourth 
clause shows how primitive arithmetic operations are interpreted — either N or B is returned, de- 
pending on the type of the operator. The evaluation of primitive arithmetic and logical operations 
can proceed without examining the arguments to the operator because the values of integers and 
booleans are ignored. These four clauses do not modify the store, so o and @ are returned, and do 
not add any elements to the delta domain map, so 1 pyap is returned. 


The first major difference between the abstract expression interpreter and the instrumented inter- 
preter is in the handling of function applications. When we interpret a procedure application in 
the abstract interpreter, we look up the result values in the incoming function environment rather 
than directly evaluating the body of the function, as we did in the standard interpreters. First, we 
compute the input value to function f. We use the values and current store o as input into the 
function map for f. The clause of the interpreter for function applications is shown in Figure 4.10. 
Note that we return a delta-domain-map A” with the singleton set containing the current input 
value for procedure f. This ensures that we compute the value of function f applied to this input 
value in future iterations of ComputePEnv. 


The evaluation of conditionals, shown in Figure 4.11, computes a summarization of the value that 
the conditional could yield under any execution of the program. In the abstracted interpreter, the 
predicate is ignored and beth branches of the conditional are executed. The least upper bound of the 
values returned by the conditional branches is returned as the result of the conditional expression. 


Abstract evaluation of block expressions is nearly the same as instrumented evaluation of block 
expressions. Figure 4.12 shows the clause of the abstract interpreter for block expressions. 


The abstract evaluation rules for tuple primitives are shown in Figure 4.13. The evaluation of the 
MakeTuple primitive is similar to the instrumented interpreter clause, except that a singleton set 
containing the abstract object label is returned as the result. Note that the new abstract object 
label is just €:/, the label on the MakeTuple primitive. 


4.3. ABSTRACTING THE INTERPRETER 


ExT *f(ser,+++58¢n) Jpo® = 


{ = SE4| sey |p; 

Un = SE4[ sen |p: 

(v, a’, AW ) = ®[f][vr. +++. Un, 7]; 

AP = Lpmarlf — {(e1.- ++, Oso) $i 


in (v',o', A, A?) } 


Figure 4.10: Abstract evaluation of function applications 


Ea] if (seo, e€1,€2) Jpo® = { ( v1,01, 471,471) =€,4[ €, |] po®; 
( v2, 02, A729, AP 2) =€,[ €2 ]pc®; 
in (v4 Ll vg, 04 Llog,A7y UU A~9,A”, LIA?,) } 


Figure 4.11: Evaluation of conditional expressions 


Ea { Bs—Dsinaz} ]po® = 


{| 1 = €13..-5%n =€n | = Bs; 
| Dealloc(y);---;Dealloc(y,) J] = Ds; 
Po = plL/r1,-++, 1/2); 
( p',a’, A7, A?) = EvalBindings, (Bs, ®, po, 0); 
AW =A~ UL p(y); 
Yi 


in ( p'[x],0’, A~", AP ) } 
where 
EvalBindings, ([ 1 = €13---3%n = €n ],9,p,0) = 
{ (01,01,A74,471) = €4l 1 ]po®; 


(n,n, A—n, A?) = E,l en ]po®; 
po = pl Uplri))/e1,-++5 (nr U plen))/en): 


=a; 
AT = As; 
AP! _ LA? :: 


( p" at Ac! AP" " _ 
if pp’ pAo'La 
then (p',o', AW", AP’) 
else EvalBindings, (| 1 = €13---3&n = €n ], ©, p’, 0’) 
in ( pl" ot AW” AP") 4 


Figure 4.12: Evaluation of block expressions 
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E4[ ‘MakeTuple (s€1,°++,5€m) J po® = 
{vx =SEx,| ser ]p; 


Um =SE4T| sem Ip; 

Vtuple = (Tuple Vis *", Um) ; 

ad =e: l; 

Vruple = a(ol]; 

a’ = a[oal 7 (tuple U Vruple)|3 


in ( {ol},o',0, Loar ) } 


E4| Select;(se) | po® = 


{ ls = SEx4| se ]p; 
(Tuple 15°" *,Um) = ul o[oll]; 
oleis 


in ( v;,0,9, Lpmap ) } 


Figure 4.13: Abstract evaluation of tuple primitives 


In the clause for primitive Select,;, note that we took the least upper bound of all the tuples that 
may be referred to by /s, and then returned the ith component of that tuple. We could also have 
selected the ith components of all the tuples to which the set /s refers, and then returned the least 
upper bound of these values. We can see that the two methods are equivalent by examining the 
definitions of least upper bound on values and tuples. 


4.4  Soundness of the Abstracted Interpreter 


This section shows that our abstract interpreter is sound. 


4.4, SOUNDNESS OF THE ABSTRACTED INTERPRETER 


Theorem 4.2 The abstracted interpreter E4 is extensive with respect to stores. 


Vp € Env,oo € Store, ® € FEnv, 
4(v,01,A7,A?) = Egle ]poo® 


a Loy 


Proof: 


Similar to the proof of the extensionality of the standard interpreter. Hl 


Theorem 4.3 The interpreter functions SE, and Eq, are monotonic. 


Vse € SE, Vpo, pi € Env, 

po Epi => SEal se J po CSE] se ] pr 

Ve € FE, Vpo, pi € Env, Yoo, o1 € Store, VOo, , € FEnv, 

pol pivoo La A SC O > Egle ]poco®o EC Eafe J pic %1 


Proof: 


Similar to the proof of the monotonicity of the standard interpreter. I 
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Finally, we require that the abstract interpreter always terminates in a finite amount of time. 


Theorem 4.4 The abstract program evaluator PE 4 always terminates in a finite amount of time. 


Proof: 


The simple evaluator S€4 terminates because it either returns the value of a literal constant 
or else looks up a variable in the environment. 


We use structural induction to show that the expression evaluator €4 terminates in a finite 
amount of time. The expression evaluator €4 has three cases: function application expres- 
sions, Letrec block expressions, and all other expressions. 


— The evaluation of function applications takes a finite amount of time because the ex- 
pression evaluator makes a finite number of calls to the simple expression evaluator, and 
then looks up the result of the function application in the function environment. 


— The evaluation of letrec blocks consists iterating to refine the environment and store of 
the body of the block. Each iteration consists of evaluating a finite number of expressions, 
so each iteration takes a finite amount of time (using our induction hypothesis). The 
size of all chains in our domains are finite, so it takes only a finite number of iterations 
for the values of the environment and store to climb to their limits. 


— The evaluation of all other expressions consists of evaluating a finite number of subex- 
pressions and combining the results in some way. Evaluating the subexpressions and 
combining the results each take a finite amount of time. 


Finally, we require the computation of the function environment to take a finite amount 
of time. Each iteration of this process consists of evaluating each function over value in 
the interesting portion of that function’s domain. All of our value domains are bounded by 
program size, so the functions’ domains must be finite. Evaluation of the body of the function 
uses the expression evaluator, so that must take a finite amount of time. The fixpoint iteration 
used to compute the function environment must also terminate in a finite number of steps 
because the sizes of all domains are finite. 


4.5 Safety of the Abstracted Interpreter 


In this section we show that the abstract interpreter P€,4 preserves the behavior of the instrumented 
interpreter. 


An abstract interpretation is considered to be safe if the abstraction of a function preserves the 


behavior of the concrete function. 


Definition 4.5 (Abstract Interpretation Safety) Given domains A, A, B, and. B, and_ab- 
straction functions Abs, : A = A and Absg : B > B, we say an abstract function f: A — B is 
safe for concrete function f: A — B if the following condition holds: 


Vae A, Wa e A. - 
Abs, (a) C4 @Absp (f(a)) Ce f(a) 


If our abstract interpreter is safe by this definition, then it must preserve object reachability. 


4.5. SAFETY OF THE ABSTRACTED INTERPRETER 93 


Theorem 4.6 S&4 is safe for SE: 


Proof: 


Vse € SE, Vp € Env, Vp © Env, : 
AbsEny(p)C p => Absy (SE| se Jp) C SE4| se |p 


By structural induction over SF: 


— If se is a boolean, then S€ returns either True or False, which abstract to B, and S&4 


returns B 


— If se is a number, then S€ returns a number, which abstracts to_N, and S€,4 returns N. 


— If se is a variable x, then SE returns p[a] and S€4 returns plz]. By our definition of 


AbsEny; 


Abspny (p) © p > Absy (plz]) © ple] 


Theorem 4.7 €4 is safe for €;7. Given function environment ® for program pr: 


Proof: 


Ve € pr, Va € AL, Vp € Env, Vp © Envy, Vo © Store, Vo © Store, : 
AbsBny(p) C pA Absstore (0) CG > Abs (Er e Jpoa) C E,[ e J po® 


By structural induction over FP: 


— If e is a simple expression, then €7 and €,4 call the corresponding simple evaluators, so 


Ex, is safe for &7. 


If e is a primitive arithmetic expression, then €4 returns either N or B, depending on 
its type. These values contain the abstraction of all possible values that €&; could return. 


If e is a function application, then €4 calls SE4 to evaluate the arguments. These 
abstract argument values contain the abstractions of the corresponding calls to SE; 
made by €;. The function environment for a program maps a set of abstract inputs 
to the most general value a function could return when applied to any concrete inputs 
contained in the abstract inputs. Therefore, &4 is safe for €; for function applications 
because it looks up the result in the function environment. 


If e is a conditional, then €4 returns the least upper bound of the evaluation of both 
branches of the conditional. This value is greater than the result of evaluating either of 
the branches of the conditional, so it must contain the value of €; applied to the branch 
of the conditional that is taken under the instrumented interpreter. 


If e is a letrec block, then evaluation consists of fixpoint iteration to compute the 
block’s environment and store. For each iteration, we take an approximation of the 
environment and store and generate refined approximations. This process is safe, by our 
induction hypothesis, because we call €4 and €; on the subexpressions to compute the 
contributions to the new approximations to the environment and store. The final result 
must be safe, because each iteration is safe and because both €4 and €; are monotonic. 
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— Ife is a tuple primitive, then the simple expression evaluators are called to evaluate the 


arguments and a tuple is constructed or dereferenced. The object label constructed by 
€, is an abstraction of the object label constructed by €;. Any tuple allocated by &4 is 
an abstraction of a tuple allocate by €; because each of the components under €4 is an 
abstraction of the corresponding values under &7. 


Theorem 4.8 P& 4 is safe for PE;. 


Proof: 


Vpr © Prog, 
Abs (PE;| pr ])C PEal pr | 


In order to show that the abstract program interpreter is safe for the instrumented program 
interpreter, we must show that the abstract interpreter constructs a function environment 
that is safe with respect to the behavior of each of the functions in the program. 


We do this by induction on the depth of nesting of function calls. 


— Base case: The initial function environment ®° maps all functions to input-output tables 


that map all inputs to bottom. ©° is safe for all expressions that do not call procedures. 


Induction Hypothesis: We assume that function environment ©* is safe for the abstract 
interpretation of expressions that expand to a call depth of &. To compute the value of 
the ®*, the k + 1st approximation to the function environment, we evaluate the body of 
each function applied to each value in the interesting domain of the function using &4 
and function environment ©*. This yields ®*+! that is safe for expressions that expand 
to a call depth of & + 1, because €4 is safe for €y. 


4.6 Determining Object Lifetimes Statically 


We can apply the abstract interpreter to a program prog to get a value, a store, and a set of 
deallocation events. Consider the example shown in Figure 4.14. The result of evaluating the 
program is a number and a store containing one tuple. If we follow the execution of the application 
of f to 68, we see that its result is 


( N, 
L Store le : lo = ( Tuple N, N)j, 
0, 
AP \ 


meaning that a number is returned as the result and that a tuple labeled €:/p was allocated during 
the execution of the application. Because the tuple is not reachable from the result we know that 
the lifetime of the tuple ends when the invocation of f ends. With a little more information about 
which identifiers in f are bound to the tuple, we could transform the program so that the storage 
associated with the tuple is reclaimed when f terminates. 


4.6. DETERMINING OBJECT LIFETIMES STATICALLY 


{ def f(w) = 
{t= g(w); 
a = Select,(t); 
b = Selecto(t); 


r= (a * b); 
in r }; 
def g(x) = 
{ty = (x-21); 
t = '0MakeTuple(x,y) ; 
int } 
def foQ = 
' £(68) 
3; 
Figure 4.14: Example with non-nested structures 
{ def f(w) = 
{ ti = g(w); 
w2 =w * 2; 
t2 = "g(w2); 
r= (w * w2); 
t3 = ‘\MakeTuple(ti,t2,r); 
in t3 }; 
def g(x) = 
{ty = (x-21); 
t = '0MakeTuple(x,y) ; 
in t }; 
def foQ = 
*2£ (68); 
3; 
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Figure 4.15: Example with false sharing 


The example shown earlier in this chapter in Figure 4.5 is slightly more complicated than the 
one we just did. It consists of a recursive procedure, foo, that allocates a tuple in each recursive 
iteration. We went through the steps of abstract interpretation in detail in Section 4.3.2, obtaining 


the following value: 


( N 


“_) 


1 store[€ : lo = ( Tuple N N) 7 €% hy - (Tuple N,N), 


=“) ——, 


b) 


AP ) 


One thing we can conclude by examining the program and this result is that both tuples allocated 
are no longer in use at the end of the program, because the program returns a number as its result. 


The example in Figure 4.15 is interesting because it shows how the abstraction of labels can cause 
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apparent sharing of structures, when in the standard or instrumented semantics there actually 
would be no sharing. The result of this program under abstract interpretation is: 


Ls lo > (Tuple N, N) 
one hy > (Tuple {lo}, {lo}, N) , 


AP, ) 


That is, the result of the program is the structure contained in locations {1,}, which is a three tuple 
containing a reference to location fp, another reference to location fo, and anumber. Note that while 
the tuples allocated by the two calls to g would have been allocated in different locations €.kg.kg:lo 
and €.k9.k,:l9 under the instrumented semantics, they have been assigned the same location under 
the abstract semantics. Thus, any analysis performed using this abstraction of activation labels will 
not be able to distinguish the tuples allocated by distinct calls to a procedure. Chapter 6 discusses 
an improved approximation of activation labels that solves this problem. 


Chapter 5 presents two algorithms that use the abstract interpreter defined in this chapter. The 
first algorithm verifies the safety of deallocation commands in programs, and the second inserts 
deallocation commands into programs. The basic approach is similar to what we have done in 
this section. The compiler uses the abstract interpreter to compute input-output mappings of 
all procedures. Then the compiler processes the body of each procedure, computing the possible 
bindings of all variables in the body of the procedure and using our deallocation safety criteria to 
verify or insert deallocation commands. 


Chapter 5 


Verifying and Inserting Deallocation 
Commands 


We have seen how to interpret KID~ programs in such a way as to determine what objects are cre- 
ated by the program (or each procedure in the program) and what objects passed into a procedure 
are reachable from the result of that procedure. We now investigate how to turn our abstract inter- 
preter into an algorithm to verify deallocation commands and an algorithm to insert deallocation 
commands. 


The verification and insertion algorithms compute the function environment for a whole program 
and then operate on each procedure of the program. Both algorithms compute the function en- 
vironment for the program and then recursively traverse the body of each procedure, calling the 
abstract expression evaluator €4 to provide a summary of the value to which each identifier could 


be bound. 


Both the verification and insertion algorithms must analyze procedure bodies with respect to a set 
of arguments provided to that procedure. In this chapter we discuss how the choice of input values 
affects the performance of the analysis of procedure bodies. We also describe how we choose input 
values for use in the verification and algorithms. 


As we did in Chapter 4, we restrict our discussion in this chapter to programs using tuples as 
their only data structure. In the next two chapters we discuss both the additions to the abstract 
interpreter, and the insertion and verification algorithms needed to handle arrays, algebraic types, 
and lists. 


The first section of this chapter presents formal conditions for the correctness of a deallocation 
statement. The second section discusses the choice of input values used during the analysis of 
a procedure and presents an algorithm for choosing these values. The third section presents a 
mechanical algorithm for verifying the correctness of deallocation statements in KID~ programs. 
The final section of this chapter presents a simple algorithm for inserting correct deallocation 
statements into a program. 


5.1 Object Deallocation Safety 


Let us consider a deallocation statement in block expression e, shown below, and use the abstract 
semantics to show the conditions under which this deallocation statement can never lead to a run- 
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time error. Expression e is a generic letrec expression containing several deallocation commands. 


e= {21 =e; 


tn = Ens 


Dealloc(y,) 


Dealloc (Yn) 

in a; } 
where the environment, store, and function environment in which e is to be evaluated are p, a, and 
®, respectively. 


We can compute environment p’ and store o’, the resulting environment and store for the block 
bindings, A~, the set of labels deallocated by the block bindings, and v, which is the result of the 
evaluation of the expression, as shown below: 


po = p[L/ai,-++,4/%] 
EvalBindings, (| 1 = €13---5%n = €n ],®, p0,0, LpmapP) 
v = pla] 
R = Reachable (v,0’) 
[T= UJ Reachable (p'[y],c) 
yeFV (e) 


i 
a 
y 
b 

S 
| 


Consider each variable y; whose value is deallocated in the above block expression. If the value of 
y; is to be deallocated safely upon termination of the letrec block, then the value to which y; is 
bound, p’[y;], must be a reference Is to a tuple. Furthermore, that tuple must have been allocated 
within the execution of expression e. Therefore, none of the labels to which y; could be bound may 
be in the set of labels inherited from the context in which this expression is executed. Also, none 
of the labels in p’[y;] can be reachable from the result p’[2;] of the block expression. Finally, none 
of those labels can be in the set of objects deallocated by other deallocation commands. 


Condition 5.1 (Deallocation Command Safety) The deallocation statement 
Dealloc(y;), shown in the code fragment above, is safe, if the following three conditions hold: 


1. ply] L=9 (y; is not inherited) 
2. ply] R= (y; does not escape) 


3. Vy, Ay - p'lydlN(e’lyj] UA) = 0 (y; ts not deallocated elsewhere) 


If Condition 5.1 is satisfied, then Theorem 3.10 from Chapter 3 applies and it is guaranteed that 
this deallocation command will not lead to dangling pointer errors. 


5.2 Choice of Procedure Arguments 


We need to determine the behavior of a procedure over all possible values to which the procedure 
could be applied in order to verify that the deallocation commands in that procedure are safe or in 
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order to insert safe deallocation commands. One way to do this is to analyze a procedure when it is 
applied to the least upper bound of all the abstract values in the domain of that procedure. Another 
choice of input values would be the least upper bound of the values in the interesting portion of 
the domain of the procedure. We discuss the use of the interesting domain of the procedure in 
this section, discuss how it sometimes prevents us from verifying deallocation commands that are 
actually correct, and then develop a better set of input values for use during analysis. 


5.2.1 Most General Input Values 


Let us consider the analysis of the body of a procedure when applied to the least upper bound of 
the values in the interesting domain of that procedure. For example, let us analyze the procedure 
foo, defined below: 


def foo (w,n) = 
{ a = Select, (w); 
b = Selects(w); 


c=at ti; 
pHecc<n; 
r = if p then 
{w= ‘oMakeTuple(c,b); 
rx’? = fo0(w’); 
in r’ }; 
else b; 
in r }; 


where the interesting portion of the domain of foo might be: 


(1, 1, 1 Store) ; 
{hi}. N, [1 7 ( Tuple N,N))) ’ 
({lo}, N, [hi - ( Tuple N, N) slo - ( Tuple N, N))) 


where the tuple labeled /, was allocated somewhere else in the program. 


The least upper bound of these values is the triple: 
({lo, LIN, [hi - ( Tuple N,N) slo - ( Tuple N,N))) 


If we examine this program by hand, we see that the lifetime of the object bound to w’ is contained 
in the lifetime of the letrec block in the then side of the conditional, because foo never returns 
its input — so w’ does not escape — and w’ is always bound to a freshly allocated tuple — so w’ 
is allocated within the letrec block. 


However, if we apply the procedure foo to the most general input value that we constructed above, 
we find the following possibilities for the bindings of variables w, w’ and r’ in environment p’ within 
the body of foo: 


Tw] = {lo, 41} 
plw] = {lo} 
pir] = N 
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The set I of labels that may be inherited by procedure foo is the set {lp,/,}, and the set of labels 
that may escape from foo is empty. Thus, even though we can determine by inspection that w’ is 
never bound to an object that escapes from the letrec block, we find that w’ may be bound to 
an object inherited by procedure foo. We cannot determine that the lifetime of w’ is contained in 
the inner letrec block using this approach to lifetime analysis. 


In this example, we have come to a safe conclusion. It is always safe to overestimate the lifetime 
of an object, but if the overestimates are too large we will never be able to verify or insert any 
deallocation commands. In fact, if we we follow this strategy of using the most general input value 
we analyze a procedure, we will never be able to verify the deallocation of a structure that is created 
by a procedure, passed to a recursive call, but not returned from the procedure. The reason is that 
passing an object label to a recursive call guarantees that the label is in the most general input 
value; therefore, the label will always be considered inherited by the procedure. 


5.2.2 Desired Properties for Input Values 


What are the important properties of the input values to which we apply a procedure during 
analysis? From the standpoint of lifetime analysis, the most important thing we know about these 
values is that they came from outside the procedure, and that if some variable within the procedure 
may be bound to one of these values, then the lifetime of the object to which that variable is bound 
may not be enclosed by the lifetime of the procedure. So we desire to choose input values in 
such a way that we can determine which variables are “contaminated” by input values, without 
getting any spurious contamination signals. We must also choose input values so that we never 
miss any contamination signals. Therefore, we cannot use bottom as an input value when analyzing 
a procedure. 


The input values we choose must also have the right type. We cannot apply a function to a number 
if it expects a tuple. 


Finally, we must be able to show that analysis of the body of a function with respect to some input 
value yields correct values for all possible values to which the function could be applied under the 
standard interpreter. 


5.2.3 Representative Input Values 


In this section we present a method for creating representative input values that allow us to avoid 
the false contamination we found in Section 5.2.1. We show that analysis performed with respect 
to these input values is safe for all possible inputs, up to renaming of the inputs. 


Let us analyze procedure foo when applied to the following representative value vey: 


({l_a},N, [i-4 - ( Tuple N,N))) 
where label /_; is a new label that does not occur anywhere in the program. 


Now if we evaluate the body of foo and determine the bindings of w, w’ and r’ we get the following 
values: 


p'tw) = {la} 
pw] = {lo} 
pir] = N 
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The set J of labels that may be inherited by procedure foo is the set {/_,}, and the set of labels 
that escapes from foo is empty. In this case, the lifetime of the object to which w’ is bound is 
contained in the lifetime of the inner letrec block. 


Name Invariance 


The question we must now answer is whether the behavior of foo when applied to this input value 
tells us anything about the behavior of foo when applied to other values. If we want to determine 
the behavior for an input vector that contains l; instead of /_,, such as the following input vector 
v: 


{Ui}, N, [li 7 ( Tuple N,N) ot ‘)) ’ 


then we can take the bindings computed for foo applied to input vector ¥,.¢), rename all occurrences 
of /_, to J;, and end up with the bindings for foo applied to /_,. For instance, we rename /_, to 
i; in our analysis of foo with respect to the representative value v,.,, we obtain the following 
information about the environment in the body of foo: 


pw) = {li} 
p'lw] = {lo} 
pir] = W 


which is exactly the result if we directly analyzed procedure foo applied to @. 


If more than one location was passed as an argument, then we substitute the set of labels for the 
set containing /_,, and duplicate the bindings of /_, in the store for each of the labels in the desired 
input value. 


We would like to show that we can take an appropriate representative input value, analyze a 
function with respect to that input value, and determine safe behavior for that function applied to 
any input value given the behavior of the function over the representative input value. In order for 
the behavior of a function over its representative input value to tell us anything about the behavior 
of the function over other input values, the representative input value must satisfy the following 
three conditions: 


1. The representative value must have the same type as all other input values to this function. 
2. None of the values reachable from the representative input values may be bottom. 


3. The labels used in the representative input value must be distinct from all labels occurring 
statically in the program. 


The first condition is almost redundant; all values to which a function is applied must have the 
same type. 


The second condition is required because the result of a function applied to any value that is in 
some way “less than” the representative input value will be less than the result of the function 
applied to the representative input value. If some component of the representative input value 
is bottom, then any input that has a non-bottom value for that component will cause different 
behavior. 


The third condition is required because we would like to be able to perform a substitution on the 
result of a function applied to the representative input value in order to obtain the result of the 
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Substy (ls, lo,.N) 
Substy (ls, l,,.B) 


(is'— {1,})Uls if l, € 1s 
l 


s! otherwise 


Substy (Is, l,, {1s’}) 


—— |b |= 


ls’ —{1,})Uls if l, € ls’ 
Substrs (1s, lo, {Is"}) = ue t y otherwise 


Substiny (ls, lo, p) = Ax. Substy (Is, lo, p[x]) 


Substgy (ls, lo, (Tuple Vis***,Um)) = (Tuple Substy (ls, lo, 01), +++ Substy (1s, lo, Um)) 


Substsy (ls,l,,o[l.]) if 1 € ls 


Subst store (15, lo,0) = Al. Substsy (1s,l,,0[]]) otherwise 


Substreny (Is, lo, ©) = 


| | ( | | LrEnv [fi — [Subst (Is, 1,0) > seat a 8/0 


fi \@st. Off AL 


Subst* ({( 31, by) +++, (bsn, bn )},0) = 
Subst® ({( 181, ly) ,+++, (bsn-a, In—-1 )}, Subst (Usp, ln, v)) 
Subst* (O,v) = v 


Figure 5.1: Definition of procedure Subst 


function applied to some other value. We will not get a correct result if we rename the objects 
allocated within the procedure call; we only want to rename or substitute for values passed as input 
to the function. After all, the most important thing we know about the representative input values 
is that they came from outside the procedure. 


We define procedure Subst in Figure 5.1. This procedure takes a set of labels [s, a label /,, and a 
value v, and substitutes /s for all occurrences of label /, in v. We define version of Subst to operate 
on denotable values, storable values, environments, stores, and function environments. 
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Let RZV be a procedure that takes a function’s type and constructs a representative input vector 
for that function that satisfies the three conditions given above. Let Match be a procedure that 
takes an input vector 7v to a function and a function’s representative input vector riv, and produces 
a substitution @ such that 


=> 


w CE Subst* (0, riv). 


Theorem 5.2 (Name Invariance) Given program pr and the function environment ® for that 
program: 


Vf; € pr, driv = RIV: TypeOf (f:,pr) J, 
Viv € Domain (f;), 39 = Match (iv, riv), 
w C Subst* (0,riv) > ®[fi|[iv] C Subst* (6, ®[f;][riv]) 


Proof: 
Sketch of Proof by Contradiction: 
Let: 
rm = [fill] 
rie = OLfillrie] 


Assume that: ©[f;J[iv] Z Subst* (0, ®[f;][rv]) 


Every portion of the results of function applications, in this case rj, and rjj,, either came 


from the input to the function or was created within the function. 


— The representative input vector riv is at least as well defined as the input vector of 
interest 2v, and so execution of the body of the function should have proceeded at least 
as far in the case of riv as in the case of iv. Therefore, all portions of the result should 
be at least as well defined. For that reason, all portions of the result that came from 
the input should be at least as well defined in the case of Subst* (@,r;;,) as in the case 
of rj,, unless some of the inputs reachable under one case were not reachable under the 
other. But the abstract interpreter preserves reachability, so all of the components of 
Subst* (6, rrj,) that were inherited from the input vector riv must contain the portions 
of the result rj, that came from the input iv, because input Subst* (0, riv) contained all 
of input iv. 

Contradiction. 


— Again, all code in the body of function f; must have executed at least as far when 
applied to riv as it did when applied to iv, because riv is more defined (has more non- 
bottom components). Therefore all portions of result r,;;, that were created within the 
function body should be at least as well defined as the portions of result rj, that were 
created within the function body. Furthermore, all of these values that are object labels 
must be the same in both cases, because all object labels depend solely on the text of 
the program, not the inputs to the function. Therefore, it must be the case that the 
portions of result r;;, that were created within the function must contain the portions 
of rj, that were created withing the function, even before substitution. Furthermore, 
none of the labels being renamed by substitution @ are created within the body of the 
function — the contract of RZV is to use labels from outside of the program. 


Contradiction. 
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Both of these paths lead to contradiction, so our assumption must be false. 


There remains the question of determining if label [9 can ever, under the concrete interpreter, both 
be allocated within the inner letrec block and be inherited by the body of the same instance 
of procedure foo. The only way an object can be allocated within an activation of a procedure 
procedure and passed into the same activation of the procedure is if the object is returned as 
part of the result and the caller of the procedure passes that value back into the procedure as 
an argument. Under this condition, the object’s lifetime cannot be bounded by the lifetime of 
the procedure, because the object escapes as part of the procedure’s result. If the object is never 
returned as part of the result, then any object passed into the procedure with the same label as an 
object allocated within the procedure must be an instance of an object allocated within a different 
activation of the same procedure. 


Theorem 5.2 has two consequences. First, it allows us to construct a representative input value for 
each function and analyze the function applied to that representative input in order to verify or 
insert deallocation commands in the body of the function. It shows us that the representative input 
vector is equivalent to the most general input vector in the most important way: distinguishing the 
values that came from outside the function from those that were created within the function. The 
use of representative input vectors in many cases allows us to avoid the false aliasing problem that 
reduces the effectiveness of the deallocation safety verification and deallocation insertion algorithms. 


Second, it allows us to derive a conservative approximation of the result of a function applied 
to a particular input value from the result of the function applied to the representative input. 
Theorem 5.2 guarantees that if the representative input is chosen appropriately, then the result 
after substitution will be an approximation of the actual result. 


Two problems remain: how to choose the representative input for a given function and how to 
choose substitutions. 


Constructing Representative Input Values for Functions 


Our approach is to generate an input value for each procedure based on its type so that if we 
analyze or transform the function for that input value the result will be correct for all input values. 
This allows us to ask more precise questions about the function because we can guarantee that 
there is no false aliasing between the inputs to the function and any structures it may allocate. 


The type of a function is a member of the domain FunctionT ype, defined below. Argument and 
result types of a function are drawn from the domain Type. 


FunctionType = (Type x.---x Type)-T ype 


TeEType = N|B| (tupie Type x --- x Type) 


The function RZV takes a function type and returns a representative input value: a tuple of abstract 
values and an abstract store. This procedure makes use of function CV, which constructs a single 
value-store pair from a single type. The signatures of functions RZV and CY are given below: 


RIV : FunctionT ype = (V x---x V x Store) 
CV : Type — (V x Store) 
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Finally, we need a function, TypeOf, which gives us the type of a procedure in the program pr. 
TypeOf takes a function identifier and a program and returns the type of that function. 


TypeOf : F—Prog—FunctionT ype 


The procedure CY, in the case of a scalar type, returns N or B, as appropriate, and the empty 
store. In the case of a tuple type, CV is called recursively on each of the component types, a new 
label J is allocated, and the set containing / is returned as the value. The resulting store is the least 
upper bound of each of the component stores with location / bound to the tuple of the component 
values. 


CVIN] = (N, Lstore ) 
CVT B] = (B, LStore ) 
CV (tuple T15°**5 Tn) J {(%1,01) =CV[ 1 J; 


(Un; on) =CVI. tr 1; 

a’ =U 

= Newloc (); 

in ( {0}, ofl + (tupte V15+++s?n)] ) } 


We call function Newloc to give us a label that cannot appear in the program — this guarantees 
that we do not get any false aliasing between the initial arguments to a function and the object 
labels allocated within the function. 


The function RZYV takes the tuple of function argument types and calls function CV to construct a 
value and store for each of those types. It returns a tuple containing each of the values and the least 
upper bound of the stores. Note that because we construct these stores with disjoint locations, the 
least upper bound of these stores is the same as the concatenation of the stores. 


RIV | (1X ++ X mat | = {(e1,01) =CVE 1 4; 


(Un, On )=CVI Tm J; 
a’ = U0; 


in (Y1,°++, Un, a") } 


5.3 An Algorithm for Verifying Deallocation Commands 


This section defines VP, an algorithm that takes a program pr and returns a set of the expression 
labels of possibly incorrect deallocation commands in the program. This algorithm errs on the 
conservative side, returning labels of deallocation commands that may never cause dangling pointer 
errors at run-time, but it never returns the empty set for programs that are incorrect. 


Procedure VP verifies programs using monotonic reasoning: it first assumes all procedures are 
correct and iteratively improves this approximation until it finds all the procedures that could 
have dynamic errors deallocating structures. We use a new mapping — a correctness map — that 
gives the most up-to-date information about the correctness of each procedure. A correctness map 
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takes a procedure name and returns either @ if the procedure contains only correct deallocation 
commands or a non-empty set of incorrect deallocation command labels otherwise. 


WeCMAP = F-P(L) 
The function VE verifies that all deallocation commands within an expression are correct with 
respect to a given environment, store, and function environment. 
Here are the signatures of the verification procedures: 


YP : Prog + P(L) 
VE : E-Env—Store—FEnv-CMAP—P(L) 


These functions are defined in the following two sections. 


5.3.1 Verification of Deallocation within a Program 


The function VP verifies that all of the deallocation statements in the main body of the program 
and in each of the functions of the program cannot lead to dereferencing a deallocated location 
under any execution of the program. The definition of VP is shown below: 


VP[ pr] = (5.1) 
il {- ++ fi(@1,+++,2n) = &rt in e} | = pr, 
(®o, Do ) = InitialF Env (pr); 
(®,D) = ComputeFEnv (pr, ®o, Po); 


Wolf]  =9 VAs 

w = ComputeCMAP (pr, ®, D, Vo); 
ds = VE [ €0 ] Len» Lstore®Y ; 

in ds } 


where expression €9 is the body of the main function fo. Procedure VP calls procedure Com- 
puteFPEnv to compute the function environment and the interesting domain map of the program. 
Then it calls ComputeCMAP to compute the correctness map for the program. The CMAP W 
takes function names and returns the list of expression labels of deallocation commands that may 
be incorrect. Finally, procedure VP calls procedure V€ to verify the correctness of expression €o, 
the body of the main procedure fo. If there are no incorrect deallocation commands that may be 
called from the main body of the program, then all deallocation commands in the program must 
be correct (or else unreachable from the main procedure). 


We revise the function InitialF Env that takes a program and returns a function environment and 
a domain map. The empty function environment is returned as the initial function environment. 
The domain map we return maps each function name to the set containing the representative input 
value for that function. 


Initial Env (pr) = 


5.38. AN ALGORITHM FOR VERIFYING DEALLOCATION COMMANDS 107 


{ Bo = LFEnv; 
VfAp,EF: 
TF, = TypeOf (fi, pr); 
(Vi,1,°7 7, Vin, 7%) = RIV 75, 1; 
Pol fil = (M1. Vin TIHV Ai € F 
in ( Bo, Do ) } 


The function ComputeCMAP iteratively improves the approximations of the correctness map until 
no further information is added. It returns the most precise approximation as its result. 


ComputeCMAP (pr, ®,D,V) = (5.2) 
{ w= A fi- lor sono €PLfi]) VE [ €; | LEnv [vi /21, a) Un[tn|oOV ; 
w= FWCY 
then W' 


else ComputeCMAP (pr, ®, D, WV’); 
inv} 


5.3.2 Verification of Deallocation within an Expression 


This section gives a definition of algorithm V€, which takes an expression e, an environment, a 
store, and a function environment, and returns the set of labels of deallocation commands in e that 
cannot be proven safe statically. 


The following four clauses of V€ show that V€ returns the empty set for simple expressions and 
primitive expressions because none of these expressions can deallocate objects. 


VE[ n]pco®¥ = O (5.3) 
VE b]poo¥ = O (5.4) 
VE « ]po®V = O (5.5) 
VE [ + (sei, 5€2) Jpo®¥ = 0 (5.6) 


Function V€ looks in the correctness map WV to see if an application of procedure f is correct with 
respect to deallocation. 


VE f(se1,+++,5en) ]po®V = Uf] (5.7) 


Verification that a function has correct deallocation statements only is performed by procedure VP, 
which tests the correctness of each function over all points in the function’s domain. 


Conditional statements have correct deallocation statements if both branches of the conditional 
are correct. The predicate cannot have any deallocation statements. The following clause verifies 
conditional expressions. 


VE | if (seo, €1,€2) po ®V = (5.8) 
VE. e1 Jpo®V U VET €2 J poo 


The essence of the verification procedure for expressions is in the clause shown in Figure 5.2. 
This clause verifies the deallocation commands in letrec blocks. This clause must compute the 
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VE { Bs—Dsinz} ]po®V = (5.9) 
{] 71 = €15-+ +5 Un = En ] = Bs; 
[ “{Dealloc (y,);-+-;%*Dealloc(y,) ] = Ds; 


po = pl L/21,+++,4/2n]; 
( ? ? AW AP’ _ . : . 
po, , ) = EvalBindings, (Bs, ®, po, 0, Lpmap); 


[= UJ Reachable (p[w], 0); 
weFV (Bs) 
R= Reachable (x, 0’); 


dsp5 = UJ VE | e; ] p'o’ OY ; 
1<i<n 
) when 
D=p'ly Jt 
A D=p'lyJAR 
ds s= , u ; 
D U A Avan (le 


[ %Dealloc(y;) JeDs 


{d;} otherwise 
in dsp, Udsp, } 


Figure 5.2: Clause to verify deallocation commands in letrec blocks 


environment p’ and store o’ of the letrec block bindings. It calls V€ on each of the right-hand-side 
expressions with respect to p’ and o’, collecting the results into dsg,. Then, it checks whether 
each deallocation statement labeled d; in Ds is safe. If V€ cannot prove that the deallocation 
statement labeled d; is safe, it collects the d; into the set dsp,. The result is the union of the unsafe 
deallocation statement labels in the body of the block and the unsafe deallocation statement labels 
in Ds. While computing set J, the set of object labels reachable from the surrounding context of 
a block expression, note that we use the incoming environment and store: p and a, instead of the 
current environment and store: p’ and o’. We can use either o or o’ here because the language is 
functional. 


Procedure VE returns the empty set for tuple allocation and selection primitives, as shown below, 
because they cannot contain deallocation statements. 


VE | ‘MakeTuple(se1,---,5€m) ]po®V 
VE | Select; (se) ] po®YV 


2 > 
oo 

or 

—_ 

—_ 


5.3.3 Verifying Some Examples 


Now let us apply the above algorithm to both a correct and an incorrect example so that we may 


observe its operation. 
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A Correct Example 


In this example, we apply procedure VP to pr, the following KID7~ program: 


{ def f(x,y) = 
{ t = oMakeTuple(x,y) 
result = g(t); 


11 Dealloc(t); 
in result }; 


def g(t) = 
Select, (t) 


def fo() = 
k1£(6,847) 


Here are the domains of the abstracted versions of f and g. 


Domain(f) = Nx NX L¢tore 
{0, {lo}} x {[lo + (Tupte N,N], [lo + L]} 


Domain (g) 
where the following are the program dependent domains: 


OL = {lo} 
Ls {0, {lo}} 
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In order to verify the correctness of the deallocation commands in program pr, we must verify the 
deallocation commands in each procedure in the program for all input vectors in the domains of 
those procedures (Equation 5.1). For expository purposes, we short cut the iterative computation 


of the correctness map W. 


First, we verify the correctness of procedure f. There is only one input vector of interest for 
procedure f, so the value of V[f] is the result of VE applied to the body of f and this input vector. 


Wit] = VEL e¢ | Lene [N/x,N/y]L store OV 


ep ={ t = MakeTuple (x,y) 
result = g(t); 
“1 Dealloc(t); 
in result }; 


First, let us compute the value of dsp,, the labels of the incorrect deallocation commands in the 


body of the letrec block. We find: 


dsp, = VE '°MakeTuple(x, y) ]p’o’é¥ 
U VET Feg(t) J p’o’b¥ 
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by Equation 5.9, where ey is the body of procedure f and p’, o’, A~', I, R and ls; are computed 
by procedure VE: 


N/x, 
va _ N/y, 
p= bew  t}/e, 
N/result 
gd = Store [lo - (Tuple N, N)] 


A = 0 
I = Reachable (p'[x],0) U Reachable (p'[y],0) = 9 
R = Reachable (p'[result],o’) = 0 


Is, = p'[t] = fe: lo} 


Using these values, we can check the correctness of the two expressions from the body of the block 
expression. There are no incorrect deallocation commands in the MakeTuple expression, so VE 
returns the empty set. To apply VE to the application of procedure g, we must compute the entry 
V[g). 

If we apply procedure VE to the body of procedure g, V€ returns the empty set because there are 
no deallocation commands or procedure applications in the body of g. Consequently, the entry in 


W for g contains the empty set. 
V[g] = 0 


Going back to the verification of the body of procedure f, we find: 


lI 
S 


VE [ MakeTuple(x, y) ] p'o’$¥ 
VET g(t) ]p'o'ay 


V[g] 


by Equations 5.10 and 5.7. Therefore, the bindings of the letrec block in f contain no unsafe 
deallocation commands. 


Now let us consider the deallocation command in the body of f. Using the values computed above, 
we see that the set of labels that may be deallocated, {/o}, has a null intersection with both J and 
R, which are the sets of inherited and escaping locations. Therefore, this deallocation command 
satisfies the safety condition, so we can conclude that it will never lead to a run-time error. 


Since both dsp, and dsp, are empty, the result of the call to V on the body of f is the empty set, 
and the entry in W for f also contains the emptyset. 


U[f] = 0 


An Incorrect Example 


Now let us apply the verification algorithm to a program containing an incorrect deallocation 
command. The program is a slight variation of the program from the previous example, in which 
procedure g returns its argument. We apply VP to the following program pr: 


{ def f(x,y) = 
{ t = oMakeTuple(x,y) 
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result = "g(t); 


11 Dealloc(t); 
in result }; 


def g(t) =t 
def fo() = 
k1£(6,847) 


} 


Again, we verify the correctness of the deallocation commands in the program by verifying each of 
the procedure bodies over each input in the domains of the function. The procedures in program 
pr have the same domains as in the previous example. 


Procedure g still contains no deallocation commands or procedure applications, so V€ returns the 
empty set when applied to the body of g. Therefore, the entry in W for g is the empty set. 
V[g] = 0 


We proceed to verify the safety of the deallocation commands in procedure f. There is one input 
value in the domain of f to consider. We call V€ on expression ef and input value (N,N, L store). 


Vé [ ef ] [N/x, N/y]1 store ®V 
where e+ is the body of procedure f: 


ef = {t= ‘oMakeTuple (x,y) 
result = g(t); 


11 Dealloc(t); 
in result }; 


/ 


To apply VE to the body of procedure f, we first compute p’ and o’: 


N/x, 
» _ | N/y, 
PT {lo}/t; 
{lo}/result 
gd = [lo = ( Tuple N,N)| 


AT = 9 
Then we compute dsp, the labels of the incorrect deallocates in the bindings of the Letrec block. 


dsp, = VE | '°MakeTuple(x,y) ] p’o’$¥ 
U VET *eg(t) J p’o’d¥ 
by Equation 5.9. Using these values, we can see that the deallocations in the bindings of the letrec 
block are correct, as in the previous example. Then we compute the other values needed: 


— 
| 


Reachable (p'[x], 0’) U Reachable (p'[y], 0’) = 0 
R = Reachable (p'[result],o’) = {e : lo} 
Is. = pl(t] = {lo} 
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If we consider the deallocation command labeled d ,, we see that set of locations it may deallo- 
cate, {lo}, intersects the set R of locations reachable from the result of the letrec block. This 
deallocation command violates the safety condition — it may lead to a dangling pointer error at 
run-time — so YV€ returns the set containing d, for the letrec block. Consequently, the entry in 
map W for procedure f is {d,}. 

Ulf] = {di} 


5.4 An Algorithm for Inserting Deallocation Commands 


This section describes a simple algorithm for inserting correct deallocation commands into KID7 
programs. This algorithm only deallocates objects that are directly named in the control region 
that bounds the lifetimes of the object. To be more complete, the algorithm would have to insert 
bindings from new variables to the nested components of dead structures in order to deallocate 
them. The details of this are left to the reader. 


First, we look at the transformations we expect the deallocation insertion algorithm to perform. 
Then in the next four sections we develop the actual algorithms for inserting deallocation code. 


5.4.1 Desired Results of Insertion Algorithm 


Let us look at a few examples. In the following code fragment, we should be able to determine 
that variable x3 can name the same objects as 21 and x2, but that 7; and x) must be bound to 
different objects. Therefore, the best transformation would be to deallocate x, and xg but not 23, 
as shown: 


{ x, = \MakeTuple (3,4); 
XQ = ’oMakeTuple(3,4); 


= '\MakeTuple (3,4); 
{ x akelup e(3,4); x3 = If p then x, else x9; 
x2 = °MakeTuple(3,4); Ss LLe 
x3 = If p then x; else x9; Dealloc(x1); 
in 7 } Dealloc(xg) ; 
in 7 } 


There is another correct way to transform this program. We could have inserted a deallocation 
command for identifier 73 instead of the commands put in for identifiers 7, and x2, as follows: 


{ x, = \MakeTuple (3,4); 
XQ = ’oMakeTuple(3,4); 
x3 = If p then x, else Xo; 


Dealloc(xs3); 
in 7 } 


This transformation is not as good as the previous one because it only deallocates one of the tuples 
that are allocated when both could be deallocated. When inserting deallocation commands, we 
should try to find as many variables that are bound to non-overlapping sets of labels as possible, 
and insert deallocation commands on these variables. 
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It is not always possible to insert deallocation commands that deallocate all dead structures if we do 
not insert conditional deallocation commands. In the following example, we can insert deallocation 
commands for 7; and x2, but not x3, because it may be bound to the same tuple as x4. 


{ x, = \MakeTuple (3,4); 
XQ = ’oMakeTuple(3,4); 


= ‘iMakeTuple(3,4); 
‘ i. = eMenetwn eta a). x3 = If p then x1 
? P ae? else '*MakeTuple (68,47); 
x3 = If p then x, ae 
lse 'MakeTuple (68,47); 
. ene akeTuple ( ) Dealloc(x,); 
in 7 } 
Dealloc(x,); 
in 7 } 


However, if we insert a conditional deallocation command, then we can deallocate all of the tuples 
that are allocated, as shown below: 


{ x, = '\MakeTuple(3,4); 
\MakeTuple (3,4); 

If p then x, 

else sMakeTuple (68,47) ; 


XQ 


X3 


Dealloc(x,); 

Dealloc(x,); 

if (x3 # x ) then 
{ --- 
Dealloc(x3) } 

else { }; 

in 7 } 


In fact, we can always take the set of all of the variables in a letrec block which are bound to 
objects whose lifetime is definitely contained in the lifetime of the block, and insert conditionals 
to guarantee that each distinct object to which the variables are bound at run time is deallocated 
exactly once. 


Yet another way we can transform this example is to insert a call to copy on the true side of the 
conditional, so that the object bound to 23 is always different from that bound to variable 7,. This 
may not make sense in this particular case, because it costs more to allocate an object than to 
perform an equality test (as we did in the previous transformation of this example). Inserting a call 
to copy makes sense if this expression is executed many times and it is much more likely to take 
the else branch than the then branch of the conditional. Then the amortized cost of the extra 
copy will be much less than the cost of the conditional before the deallocation command. 


{ x, = '\MakeTuple(3,4); 
XQ = \MakeTuple (3,4); 
x3 = If p then 
* copy (x1) 
else 'sMakeTuple (68,47); 
Dealloc(x,); 


Dealloc(x,); 


114 CHAPTER 5. VERIFYING AND INSERTING DEALLOCATION COMMANDS 


Dealloc(xs3); 
in 7 } 


The following example shows that we may have to insert code in order to name all of the objects 
that may be deallocated. We can insert a deallocation on variable x directly, but we must insert 
a binding to name the second component of the object named by x, which is also a structure that 
may be deallocated in the outer block expression. 


{ x= { y = \MakeTuple (3,4); 
z= oMakeTuple(4,y); 


{x= (fy = "MakeTuple(3,4); in z } 
z= MakeTuple(4,y); ~ w = Selects(x); 
in z } --- 
in 7 } Dealloc(x); 
Dealloc(w) ; 
in 7 } 


5.4.2 The Algorithm 


This section presents a simple algorithms for inserting deallocation commands in KID~ programs. 
This algorithm only inserts commands to deallocate tuples that are named by variables in the 
program. It does not insert bindings to name components of tuples whose lifetimes are bounded by 
that of the block. It also does not insert conditionals after the barrier. Once the basic algorithm is 
understood it is straightforward to increase its effectiveness by having it insert code to deallocate 
the components of dead structures and insert conditional deallocation commands to deallocate all 
structures bound to identifiers that may be aliased. 


The algorithm works in a greedy fashion on the set of identifiers bound in a block. It inserts 
deallocation commands for each identifier that is bound to a set of labels that satisfies these three 
conditions: 


1. The lifetime of each of the labels in the set is enclosed by that of the block. 
2. None of the labels are deallocated by one of the deallocation commands inserted earlier. 


3. None of the labels are deallocated elsewhere in the program. 


This algorithm is implemented by four procedures: TP and TE€, which transform programs and 
expressions, and DS and DY which return a list of deallocation statements for lists of bound 
variables and single bound variables in a given letrec block. 


Here are the signatures of procedures used in the insertion algorithm: 


TP : Prog—Prog 

TE : Exap-Env—Store+FEnv-Exp 
DS : X*+Env-Store+Ls-+Ls+Ls+DS 
DX : V=X—-Ls—-Ls-Ls—(DS x Ls) 


Procedure TP takes a program, computes the function environment for the program, and calls 
procedure JE to insert the appropriate deallocation statements in the body of each procedure in 
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TP prj = (5.12) 
{D t--: fil@1.:++5@n) = e+} J = pr; 
Do = InitialDMAP (pr); 
(®,D) = ComputeF'Envg (pr, Lrgnv, Po); 


Vii € {fo.- ++ fe}; | 


Tf; = TypeOf (fi, pr); 
(Vids tts Vins Mi) =k | TF; I; 


[ e; | =TE| €; ] Lene [via /21,°°°; Vin/tnloi® 3 


in [ {0+ filaiy+++5 2) = eh FY} 


Figure 5.3: Procedure to insert deallocation commands into programs 


the program. Procedure T€ takes an expression e and the most general environment, store, and 
function environment in which that expression executes. It returns a transformed expression e’. 


The procedures DS and DY are used by procedure TE when translating letrec blocks. Procedure 
DS takes the set of variables bound by the letrec block, and the environment and store that are 
active in the bindings of the letrec block. In addition to the environment and store, it takes the 
set of inherited, escaping, and previously deallocated object labels. The inherited labels are those 
that are reachable from the context of the letrec block, the escaping labels are those reachable 
from the result of the letrec block, and the previously deallocated labels are those deallocated in 
the bindings of the letrec block. Procedure DS returns the set of deallocation commands for the 
identifiers that are determined to be safely deallocatable. 


Procedure DS calls procedure D¥ on each bound identifier. Procedure D¥ is the procedure that 
actually generates a deallocation command for an identifier x when it is safe to deallocate the 
value of that identifier in a particular context. If procedure DY is applied to a variable x, and 
the deallocation safety condition is met for x, then DY returns a deallocation command for x. 
Procedure DX takes as input the binding of 2, the variable x, and the sets of inherited, escaping, 
and previously deallocated object labels, and returns a set of deallocation commands and the set 
of object labels that would be deallocated by those commands. 


5.4.3 Inserting Deallocation Commands in Programs 


Procedure 7P, shown in Figure 5.3, takes a program pr and returns anew program with deallocation 
statements added to the bodies of each of the procedures in pr and the main expression of pr. 
Procedure TP calls procedure T€ on each function body and the main expression of the program 
with the most general environment and store in which those expressions could be evaluated. It 
then reassembles the transformed expressions into a new program pr’. 
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5.4.4 Inserting Deallocation Commands in Expressions 


The procedure T€, which inserts deallocation commands into expressions, takes an expression, an 
environment, a store, and a function environment, and returns a new expression and the new set 
of labels that are deallocated during the execution of the new expression. 


Procedure TE does not insert any deallocation statements in simple expressions and primitive 
expressions. The clauses shown below handle the processing of these expressions. The result 
returned from the function 7€ is a syntactic expression. These values are surrounded with syntax 
brackets, ¢.e., [| x ], to show that they are new program text. 


TE. n]po@ = [rn] where n is a number 

TE. b] po’ = [bd] where 6 is a boolean 

TE. « ]pob = [2 ] where is a variable 
TE | +(se1,s€2) ]po® = [ +(se1, se2) ] 


As shown below, no changes are made to function application expressions. All changes will be made 
to the body of the function f; when it is transformed. 


TE * f(se1,-++,8en) Jpo® = [*f(ser,--+, sen) J 


Procedure TE processes conditional expressions by generating a new conditional with both branches 
transformed, as shown below: 


TE | if (seo, €1, €2) Jpo® = 


{Le ]=TEL er |poe; 
[ey J=TE[ e ]poo; 


in [ if (seo, e1,€5) ] } 
No changes need to be made to tuple manipulation primitives: 


TE | '‘MakeTuple(se1,---,5€m) Jpo® = [ 'MakeTuple(se1,---, sem) ] 
TE | Select; (se) |] po® | Select; (se) ] 


As in the procedure for verification, the processing of letrec blocks is where most of the work 
is done during program transformation. First, the environment, store, and set of object labels 
deallocated by the let block must be computed. Then new binding right-hand-sides must be 
generated by transforming the old bindings. Then the set of labels reachable from the result must 
be computed. A new set of deallocation statements is generated by calling procedure DS with the 
set of identifiers bound by the letrec block, the environment, the store, and the sets of reachable, 
allocated, and deallocated labels. Finally, the new right-hand-side expressions and deallocation 
statements are assembled into a new letrec block and returned. The definition of this clause is 
shown in Figure 5.4. 


Procedure DS takes a list of the identifiers of a letrec block, the environment, and store of the 
body of the block, the set of labels of objects reachable from the context of the block, the set of 
labels of objects reachable from the result of the block, and the set of labels deallocated by the 
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TE] { Bs—Ds inz} ]po® = 
{[ = €13.-.5%_ = en ] = Bs; 
[ Dealloc(y);---;Dealloc(y;,) ] = Ds; 


Po = plL/x1,+++,1/2n]; 
( p’,o', A~, AP) = 
EvalBindings, (Bs, ®, po,0, LpMap); 


Le] =TéLe Ip'o'o; 


[eo] =TELen [o'o'®; 
[ Bs’ J=[| t1=e45-- san = 4, | 


I = UJ Reachable (p[w],o); 
weFV (Bs) 
R = Reachable (p'[x], 0’); 


[ Ds! J=DS | a1y-+ yan Jp’o TRAW 
in] { Bs’—Ds; Ds' in x} ] } 


Figure 5.4: Clause to insert deallocation commands in letrec blocks 


block bindings. It returns a deallocation statement for each bound identifier whose value satisfies 
Condition 5.1, and the set of labels deallocated by those deallocation commands. It calls procedure 
DHX on each identifier. Procedure DY generates a deallocation command for each identifier that 
satisfies the safety condition. 
DS [ £1,°++,%n JpoT RAT = 
{([ Dsi J, A471) = DU (p'[ei]) | a1 J LRAT ; 


([ Ds, J, Av, ) = DX (p'[zn]) | ty JIR [s- U (U a)) ; 
<n 
in [| Ds,;---;Ds, ] } 


5.4.5 Generating Deallocation Statements 


Procedure DY takes the value of a bound variable 2;, the variable z;, and the set of labels inherited 
by, escaping from, and deallocated by the surrounding letrec block. It returns a deallocation 
command for x; if it is safe to deallocate the value of x; upon termination of the surrounding 
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letrec block. The value of x; may be safely deallocated if x; is bound to a reference to a structure 
and that structure is allocated within the current letrec block, is not reachable from the result of 
that letrec block, and cannot be deallocated by any other deallocation command. 


DX(L)[« ]JTRAT~ = ([ ],9) 

DX(N)[ a ]IRAT~ = ([ ],9) 

DA(B)[ 2 ]IRAT~ = ([ ],9) 

DX (ls)[ « J LRAT iff O@=Isnl 
AQ=IsNR 
AQa=Ilsn Am 


then (|| Dealloc(z) ], Is ) 


else ([ ],9) 


The final clause of DY actually inserts all of the deallocation commands. The set of object labels 
to which « may be bound is /s, the set of locations passed into the current expression from the 
surrounding context is J, the set of locations reachable from the result of the expression is R, and 
the set of locations deallocated elsewhere is A~. A deallocation is only inserted if /s is disjoint 
from J, if /s is disjoint from F& and if ls is disjoint from A~. If these three conditions are met, then 
a deallocation command is returned, along with the set Js of locations it may deallocate. 


A more aggressive algorithm for inserting deallocation commands would examine the contents of 
any tuple it might deallocate to see if any of its components was also a structure that could be 
deallocated. If so, then more deallocation commands could be inserted, along with corresponding 
bindings of new variables to selection expressions in order to name the appropriate tuple compo- 
nents. 


We do not present a more aggressive algorithm here. A more aggressive algorithm is basically the 
same as the one just discussed but augmented in places to track more information and to generate 
more complicated deallocation code. In Chapter 10, we discuss the deallocation command insertion 
algorithm that we implemented. 


5.4.6 Transforming Some Examples 


In this section we apply function 7P to an example to see the process of inserting deallocation 
commands. We will walk through the transformation of the example from Section 5.3.3 with the 
deallocation statement removed. For reference, here is the text of the modified program pr: 


{ def f(x,y) = 
{ t = '0MakeTuple(x,y) 
result = "g(t); 
in result }; 


def g(t) = 
Select, (t) 


def fo() = 
k1#(6,847); 
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First, we determine the types of f and g in program pr: 


TypeOf (f,pr) = (Nx N)=N 
TypeOf (g,pr) = (Tupie N,N)+N 


We use these types to construct representative input vectors: 


RIVI(N x N)>N J = (ALN, Lsiore) 
RIV[ (tute N,N)>N Jo = (tla, [1 > (tupte NN)]) 


=“) ——, 


Procedure TP (Equation 5.12) computes the function environment for the program and then con- 
structs the following program: 


- / 
{ def f(x,y) = e; 


def g(t) = 
def fpQ = e. 
} 
where 
ef = TE | €f ] LEny [N/x, N/y]Lstore® 


eg TE | eg | Line[{l-1/t] [1 > (Tupte N, N)]® 
ef _— TE | €fo ] LEny 1 Store® 


and e¢ is the body of procedure f, and eg is the body of procedure g, and eg, is the body of 
procedure fo. 


Let us follow the transformation of the body of procedure f. First we need to compute a number 
of values: 


[ {BS --- DS inz}] = e¢ 
[ t=e1; result=e,] = BS 
I] = bs 
p = [{lo}/t, N/result, N/x, N/y] 
a’ = [lo = ( Tuple N, N)| 
AT = 


where p’ and o’ are the environment and store of the body of expression e¢ and A~' is the set of 
labels of objects deallocated in e¢. 


These values are computed by applying MvalBindings to the bindings of the letrec block, the 
current activation label, and the current function environment. This procedure finds the fixpoint 
of the resulting environment, store, and set of deallocated objects’ labels. From these values, we 
compute the additional values necessary to test the safety of deallocating the value bound to each 
identifier at run-time: 


QS BS 
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No labels are inherited by the body of procedure f from the surrounding context, and no labels are 
returned from f. 


Now we apply function DS to the bound identifiers of the Letrec block. This procedure calls DV 
on variables t and result and the value to which that identifier is bound, along with J, R, and 


AW: 
DX ({lo}) | t JL RAT~ = ([ Dealloc(t) ], {lo} ) 
This call returns a deallocation command for identifier t and a set containing object label [yp because 


it is safe to deallocate the value bound to t upon termination of the control region containing the 


block bindings. 
The call to D¥ on identifier result follows: 


DX(N)[ result JJTRA~ = ([ ],9) 


No deallocation command is returned in this case because result is not bound to any objects. 


The other two procedures in the program, g and fo, are unchanged because there are no objects 
safe to deallocate in those procedures. 


The transformed program is: 


{ def f(x,y) = 
{ t = /oMakeTuple(x,y) 
result = g(t); 
Dealloc(t) 
in result }; 


def g(t) = 
Select, (t) 


def fy() = 
kK £(6,847); 


as we expected. 


5.5 Summary 


In this chapter we developed algorithms for performing object lifetime analysis and used this lifetime 
information to verify or insert object deallocation commands. This analysis technique is based on 
an abstraction of the operational semantics of KID~. 


In the next few chapters, we extend the analysis framework to handle more data types and higher- 
order functions. We also improve the modeling of activation labels to yield more precise information 
about the sharing of objects. 


Chapter 6 


Improving the Abstract Object 
Labels 


In this section we look at a more informative abstraction of activation labels that yields better 
information about the identity and lifetime of objects allocated by programs. First, we introduce 
a new abstraction based on regular expressions that partitions standard activation labels into 
equivalence classes. Next, we present the changes to the abstract interpreter definition necessary 
to use these activation labels. Finally, we analyze an example using these activation labels. 


6.1 A Better Abstraction of Activation Labels 


In Chapter 4, we saw one way to abstract activation labels so that abstract interpretation was 
guaranteed to terminate. However, we lost a great deal of information about the identities of 
objects that is very useful in the analysis of programs. In this section, we examine more precise 
abstractions of activation labels that yield better results in the analysis of programs. 


In Chapter 3, we saw that activation labels were composed of a sequence of expression labels 
separated by ‘.’, where each expression label was the label of a particular function application in 
the program. 


The abstraction of activation labels should preserve some information about the standard activation 
labels. In fact, we would like abstract activation labels to be exactly the same as standard activation 
labels except in recursive invocations of functions. Figure 6.1 shows an activation tree consisting 
solely of non-recursive procedure calls. It is safe to do so because the set of such labels is bounded 
by the size of the static call graph of a program. 


Figure 6.2 shows the static call-graph of three procedures, f, g, and h, where g is a recursive 
procedure, and the corresponding activation tree showing the structure of the activation labels of 
recursive calls to g. We would like to distinguish the activations of the initial application of g 
inside procedure f from the recursive applications of g inside procedure g. We can capture this 
by abstracting sets of standard activation labels as regular expressions. For example, the abstract 
activation label 1.2+ would represent the following set of activation labels: 


{1.2, 1.2.2, 1.2.2.2,...} 
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Figure 6.1: Nonrecursive activation tree 


Figure 6.2: Call graph of a recursive procedure 


Likewise, the abstract activation label 1.2*.3 represents 
ss ee es ea eee at eee 


which are the activation labels of the calls to procedure h. 


We can think of the program’s call graph (which can be statically determined because KID~ is 
a first order language) as a finite automaton that accepts some set of strings. These strings are 
the standard activation labels. Every function represents a state in the automaton, and every 
application primitive represents a labeled edge. Every state in the automaton is an accept state. 
Our improved abstract activation labels for a program are the minimal regular expressions accepted 
by the finite automaton derived from the program’s call-graph. 


The improved ALE domain, shown below, consists of regular expressions that match all possible 
concrete activation labels. Abstract activation labels consist of an activation label paired with an 
expression label using “.”, the disjunction of two activation labels, or the zero or more repetitions 
of an activation label. 


a@é€AL = €|(AL.L)|(AL+AL)| (AL) 
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As in the standard domain of activation labels, the abstract activation label domain is a flat 
domain — all abstract activation labels are above the bottom element, but each one is incomparable 
with all of the others. The abstraction function that maps a standard activation label into an 
abstract activation label chooses the regular expression that accepts that particular activation 


label. 


Abstract object labels will now consist of pairs of our new abstract activation labels and the static 
MakeTuple labels. Abstract object references are still sets of abstract object labels. 


We also extend the function environment domain to map function names to mappings from products 
of abstract values, stores, and activation labels to pairs of an abstract value and a store. 


®eFEnv = F-((V" x Store x AL) -(V x Store)) 


6.2 Example Abstraction Operators for Activation Labels 


The function that abstracts activation labels depends on program structure. Abstract activation 
labels form equivalence classes for different paths through the call graph. The domain of abstract 
activations corresponds to the minimal set of regular expressions that name all the paths that start 
at the root of the call graph and end at each node in the call graph. 


We can define an abstraction function for the program whose call graph is shown in Figure 6.1. 
This program has four procedures, p, q, r, 8, and fo and five function application expressions with 


labels ky, ko, k3, ke and ks. 


The function that we need in the abstract interpreter takes an abstract label and the expression 
label of an application expression, and returns a new abstract activation label. This function 
simulates a DFA where there is a state for each acyclic path to each function, and transitions are 
taken on the labels of application expressions. For example, the next activation label function MAC 
for the program in Figure 6.1 looks like: 


NAC (€, ky) = ky 


NAC (ki1.k2) = hy -ke 
NAC (ke1,k3) = hy.kg 
NAC (ky.k2,k4) = hy ka. 
NAC (ky.k2,ks) = hy ko.ks 
NAC (hy.k3,k4) = hy.k3.ha 
NAC (hy.k3,ks) = hy.kg.ks 


NAL (a,k) = T otherwise 


Essentially, procedure q was split into two states, depending on whether it had been reached by 
the application labeled kz or the one labeled kz. 


The next activation label function for the program whose call graph is shown in Figure 6.2 is more 
interesting, because this program contains recursive calls. We cannot split nodes to distinguish 
different paths through recursive calls, because this would lead to an infinite number of nodes. The 
function MAC for this graph is: 


NAC (€, ky) = ky 
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ExT flser,-++,8en) [pad = 


{ M1 = SEa[ ser |p; 
Vn, =SEx4|[ se, |p; 
al = NAC (a, k); 


( v, a’, AT )= &[f][v1,- °*, Un, 9, a’); 
in (v,0',A7, A?) } 


Figure 6.3: Evaluation of procedure calls with improved abstract activation labels 


NAL (ky, k2) = hy .k3 
NAL (ky.k3,k2) = hy. kS 
NAC (hy h5,k3) = by R5.ky 
NACL (a,k) = T otherwise 


In this example, all activations of procedure g have activation label k,.k3. If there were more than 
one non-recursive path to invoke procedure g, then these would have distinct activation labels. 


6.3. Extensions to Abstract Interpreter 


This section describes the way to revise the abstract interpreter to compute improved activation 
labels. The expression evaluator now takes an abstract activation label in addition to the environ- 
ment, store, and function environment that it took before. 


The new evaluation rule for function applications is shown in Figure 6.3. Note that the function 
NAC is used to create a new abstract activation label given the current activation label a and the 
expression label &. We look in the function environment © to find the value of the body of the 
function evaluated with activation label a’, which is the abstraction of the current activation label 
concatenated with k. 


The revised abstract interpreter clause that evaluates the MakeTuple primitive is shown in Fig- 
ure 6.4. This clause constructs a new object label from the current abstract activation label a and 
the expression label /. Other than that it is the same as the original abstract interpreter clause for 
MakeTuple. All other clauses of the abstract interpreter remain the same, except that activation 
labels are passed to the expression evaluator. 


6.4 Evaluation of Examples Using Improved Activation Labels 


Figure 6.5 contains an example that we saw earlier where sharing is falsely detected by the abstract 
interpreter using completely static object labels. Let us reexamine this example using our improved 
abstraction of activation labels and object labels. 


Let us examine the input-output behavior of procedure g first. If g is applied to a number in context 
(p, 7, a), it returns a reference to an object in location a : [9 and a store o’ which is derived from 
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E4[ ‘MakeTuple (s€1,°++,5€m) J pra® = 
{vx =SEx,| ser ]p; 


Um =SE4T| sem Ip; 

Vtuple = (Tuple Vis *", Um) ; 

ad = =a:l; 

Vruple = a(ol]; 

a’ = a[oal 7 (tuple U Vruple)|3 


in ( {ol}, 0,0, Lomap ) } 
Figure 6.4: Evaluation of tuple allocation with improved abstract activation labels 


{ def f(w) = 
{ ti = g(w); 
w2 =w * 2; 
t2 = "g(w2); 
r= (w * w2); 
t3 = 'sMakeTuple(ti,t2,r); 
in t3 }; 
def g(x) = 
{ty = (x-21); 
t = /oMakeTuple(x,y) ; 
in t } 
def foQ = 
k2# (68) ; 
3; 


Figure 6.5: Example with false sharing 


store o as follows: 
a = ofa:lg = (ofa: Io] U (Tupte N,N))| 


Now, let us study the internal behavior of function f when applied to a number and the empty 
store in activation ¢. We evaluate the bindings of the letrec block in the body of f to yield the 
environment p’ and store o’: 


w > N 
t1 => {e.ko : Io} 
w2 > N 

p! = LEny 42 => {e.ky : Io} 
r —- N 
13 => {e : I3} 
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€.ko : lo - ( Tuple N,N) 
a’ = LStore €.ky : lo - ( Tuple N,N) 
€: lg > ( Tuple N,N,N) 


Using the more precise abstraction of activation labels, we can distinguish between the tuples to 
which t1 and t2 are bound. We can tell that they must be different objects. Therefore, in the 
body of procedure fo, we can insert code to deallocate all three tuples allocated, rather than only 
two. This degree of precision can be very useful. 


6.5 Summary 


There are many ways in which we can abstract activation labels. In this chapter we discussed one 
way to abstract activation labels that improves the effectiveness of the analysis compared to the 
abstracted activation labels that we used in Chapter 4. We use this abstraction for the remainder 
of the thesis, and we use a variation of these abstract activation labels in our implementation of 
the lifetime analysis. 


Chapter 7 


Abstracting and Analyzing Arrays 


The abstract interpretation of arrays is different from that of tuples: the size of an array is computed 
at run-time, while the size of a tuple is fixed at compile-time. Section 7.1 discusses our approach 
to abstracting arrays — we summarize all elements of an array by one abstract value. 


This array abstraction leads to problems determining whether there is sharing among the elements 
of the arrays. Section 7.2 discusses an improved array abstraction that contains an annotation 
informing whether any elements in the array are shared. 


Sometimes it is difficult to define an array using MakeArray, even though the program fits nicely in 
the single-assignment paradigm. Id has I-structure arrays to extend the single-assignment paradigm 
beyond the functional subset. I-structures are non-functional, single-assignment arrays whose pres- 
ence greatly increase the expressiveness of the language, and only slightly increase the complexity 
of lifetime analysis. For example, writing a function that finds the inverse of a permutation takes 
O(n?) space and time when written using MakeArray, but can easily be written in O(n) time and 
space using I-structures. Section 7.3 discusses the addition of I-structures to our instrumented and 
abstract interpreters and their impact on the deallocation safety condition. 


7.1 Abstract Interpretation of Arrays 


Arrays are aggregate objects whose size is not determined until run-time. In the interpreter, 
the objects must be represented by structures with a fixed number of components because we 
require abstract interpretation to take a finite amount of time. We summarize the value of an 
array of arbitrary size as an abstract array with a single element. Our array abstraction has a 
single component that represents all of the elements of the concrete array. This single abstract 
element is the least upper bound of the abstraction of all of the concrete array elements. We call 
this summarization spatial summarization. Spatial summarization combines information about an 
uncertain reference or spatial path, not about an uncertain control path. 


For example, consider the following concrete array of tuples: 
(Array 3, h, lo, Is) 


where 3 denotes the length of the array and J, fy and ls are the labels of concrete tuples. The 
abstraction of this array would be an abstract array with one element summarizing all of the 
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ADS Array (Carrey Nevis, Un)) = (40 | | «) 
a 
Figure 7.1: Array abstraction operator 


(Array vo) U Array (Array v1) = (Array vo Uy v1) 


Figure 7.2: Array least upper bound operator 


(Array vo) EC Array (Array v1) = wo Ev vy 
Figure 7.3: Array ordering operator 


elements of the standard array: 
(Array {h, lo, i3}) . 


The element {1,,13, ls} indicates that the components of the standard array could be any one of 
the abstract tuples named by (1, /> or Iz. 


If we had subscript range information, we might be able to abstract the elements of an array into 
a small number of elements that represent the values that could be present in subregions of the 
array under the standard interpretation. In that case, an array would be represented as a set of 
intervals and the abstract values that summarize the components of the standard array contained 
in those intervals. The use of range information during abstract interpretation is an area for further 
research. 


7.1.1 The Abstract Array Domain 


We add the following definition of arrays to our abstract domains, and revise the definition of the 
abstract store and storable value domains. 


Varray € Array = (Array V) Arrays 
sve SV = Tuple+ Array Storable Values 
a € Store = L=SV Stores 


Figure 7.1 contains the function Abs4;;a,, which maps standard array values into abstract array 
values. Figure 7.2 contains the least upper bound operator on abstract arrays, and Figure 7.3 
contains the ordering operator for abstract arrays. 


These domains, along with the added ordering and abstraction operators, allow us to revise the 
abstract interpreter to model arrays. 


7.1.2 Abstracting the Array Primitives 


The following two clauses, 7.1 and 7.2, give the abstracted evaluation rules for the array primitives. 
As in the standard interpreter, the MakeArray primitive is subscripted with the name of a function 
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f; and takes a length value n and r values to be passed to the calls to f;. The length n is ignored, 
because abstract arrays all contain a single component. The abstract interpretation of this primitive 
uses the static label / of the primitive directly as the object label of the abstract array created. In 
this way, it resembles the abstract interpretation of the MakeTuple primitive. 


First, we compute the value of the application of function f; to the input value consisting of 
the values NV, the r input values, and the current store. As with the interpretation of function 
applications, we look up the returns value of the function application in the function environment 
and add the input value to the interesting domain of function f; in the new domain map delta A”. 
We use the result value of the function application as the representative element value of the array. 


Ea] 'MakeArray, (s€o, S€1,°°+,5€-) J poa® = (7.1) 
{ M1 = SEa[ ser |p; 
Vp =SE,4|[ se, ]p; 
a’ = NACL (a, 1); 
( u, a’, AT ) = O[f,)N, 157°, Urs D, a’); 
Varray = (Array w) ; 
ol =a:l; 
Vorray = o'[oll); 
a” = a'[ol 7 (Varray U Varray )]i 
AP (fil = {(N,01,°+°, U7, 0,0") }; 


in ( {ol},a”",A7, AP) } 


The abstract interpretation of the Fetch primitive is very similar to the abstraction of the Select, 
primitive. Fetch takes two values: a set /s of labels and an index. Fetch takes the least upper 
bound of the arrays to which each of the labels in /s is bound in store o, and then returns the 
element value of that array. 


E4[ Fetch (sey, s€2) ]pac® = (7.2) 
{ ls =SEy,| sei |p; 
(Array 0) = LI] ofol; 
léls 


Oo 
in ( v,0,0, LpmapP) } 


The abstraction of the Bounds primitive is very simple. It ignores its argument and returns an 
abstract integer as its result. 
E4| "Bounds (se) ] pao® = 
{in (.N,o,0,Lomar) } 
Now that we have an abstraction of array values and have augmented the abstract interpreter with 


clauses for the array primitives, let us examine some array program examples and see how our 
lifetime analysis algorithm performs. 


7.1.8 Example Array Programs 


The first example we look at is shown below. It consists of a function £1 takes three numbers and 
constructs an array containing a different tuple in each element. The function g1 is the function 
that defines each of the array elements. 
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def gi (i, x, y) = 
‘oMakeTuple(x,y,i); 


def f1 (n, x, y) = 
MakeArray 41 (n, x, y); 


Our abstract interpretation of this example computes the following representation for the value of 
pando within f,’s body: 


{hi} 
Li] = (Array {lo}) 
(Tupte N,N,N) 


__) =? 


a 
2, 
lI 


— 

iS 

2S. 
| 


That is, variable ais bound to an array labeled /, which contains a a three-tuple of numbers labeled 
lo as its element. 


We can determine that the lifetime of the array labeled /, is bounded by the lifetime of £1, because 
i, is not reachable from the labels inherited by £1 (the empty set) and because J, is not returned 
as part of the result of f1 — a single number is returned. The same is true of the tuple labeled 
lo — its lifetime is bounded by that of procedure f1. 


7.2. SHARING ANALYSIS IN ARRAYS 131 


Now consider the following example, which is similar to the previous one, except that the tuple 
labeled lo is allocated by procedure £2 and passed to the procedure g2 that computes the elements 
of the array /,. Thus, £2 allocates an array where a tuple is shared by each of the elements. 


def £2 (n, x, y) = 
{te ‘oMakeTuple(x,y,4); 
ae MakeArray 4o(n, t); 
in 3 }; 


def g2 (i, t) = t; 


The representation computed for the value of a is the same as in the first example: 


a = {h} 
oll] = (Array {lo}) 
( Tuple N, N, N) 


Q 
at 
= 
fo) 
2, 
l| 


The variable a is bound to an array labeled /, containing a tuple or tuples labeled Jp. In this 
example, we can determine that the lifetimes of array /; and tuple /9 are bounded by the lifetime 
of procedure f2. 


In both of these examples, we can verify that it is safe to deallocate the array bound to a when 
either £1 or £2 terminate, because label /, is allocated within the body of f1 and f2, /, does not 
escape, and J; cannot be deallocated elsewhere. 


There is one fact we have not been able to uncover using our lifetime analysis, and that is that in 
the first example, each element of the concrete array is distinct, and that in the second example, 
each element of the concrete array is the same. If there is no sharing, then the compiler may insert 
code to deallocate each element of the array. If there is sharing, then deallocation of the elements 
becomes a little more difficult because we cannot deallocate any element more than once. We can 
work around this problem with run-time support. The run-time code that deallocates the elements 
of an array must keep track of the objects it has deallocated to ensure that it deallocates each 
unique element of the array exactly once. 


The first example actually has no sharing. But because the abstraction does not yield sharing 
information, the compiler must generate code that carefully deallocates each distinct element of 
the array a for both f1 and f2. This strategy of code generation is safe, but is less efficient than if 
we could determine that there was no sharing of elements in procedure f1. 


7.2 Sharing Analysis in Arrays 


In the previous section, we defined an abstraction of arrays and showed how to perform lifetime 
analysis on programs containing arrays. We also saw that the analysis does not capture an impor- 
tant fact about the arrays, namely, whether the elements of the array are shared are not. 


This section investigates a change to the representation of abstract arrays in the abstract interpreter 
so that the analysis yields sharing information. The approach we take enables us to determine 
whether two elements of an array are completely distinct or whether they may be shared at some 
level. 
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(Array $0; V0) UArray (Array $1,071) = (Array ($0 Us $1), (vo Uv €1)) 
UnShared Ug UnShared = UnShared 
UnShared Ug Shared = Shared 
Shared Ug UnShared = Shared 
Shared Ug Shared = Shared 


Figure 7.4: Improved array least upper bound operators 


7.2.1 Modeling Sharing in the Abstract Array Domain 


In order to track the sharing of array elements, we add an annotation to each abstract array that 
indicates whether the array components may be shared or not. This sharing annotation is drawn 
from domain $, which consists of two values: Shared and Unshared, where Unshared C Shared. 


ses = Shared + Unshared Sharing Predicate 
Varray € Array = (Array S,V) Arrays 

sve SV = Tuple + Array Storable Values 

a € Store = L—SV Stores 


If we have an array of nested structures, we take Unshared to mean that the structures stored 
in each element of the array are completely unaliased from the structures stored in every other 
element of the array. 


Figure 7.4 contains the least upper bound operator on abstract arrays, Figure 7.5 contains the 
ordering operator for abstract arrays and Figure 7.6 contains the abstraction operator for arrays 
with sharing. 


7.2.2 Abstracting the Array Primitives with Sharing 


The clauses of the interpreter must be augmented to compute the proper sharing information. 
The only change is to MakeArray, which generates either a shared array or an unshared array. A 
call to MakeArrays, generates an unshared array only if all of the labels reachable from the the 
application of procedure f; are disjoint from the locations reachable from the arguments to the call 
to MakeArray. Since none of the inherited labels are reachable from the element value resulting 
from the application of f;, all of the labels reachable from the element value must be allocated 
within the application, and none may be shared among elements of the array. 


E4[ "MakeArray, (s€o, 5€1,°°+, 8€,) J poae = 
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(Array 80,00) EC Array (Array $1,071) = (80 Eg 51) A (vo Ev v1) 
UnShared Cg UnShared = True 
UnShared Lg Shared = True 
Shared Lg UnShared = False 
Shared Lg Shared = True 


Figure 7.5: Improved array ordering operators 


ADS Array ((Array Nevis, Un)) = { s= af \\ Vi # Vj 
1<t,j<n 
then Unshared 
else Shared; 


Figure 7.6: Abstraction operator for arrays with sharing 


{7 = SE, se, |p; 
Vp =SE,4|[ se, ]p; 
a! = NACL (a, 1); 
(u, 0, AW) = O[FJIN, 01, -+ +, Or, 0, 0°]; 
I =U,, Reachable (v;,0); 
R = Reachable (u)o’; 
Varray = if RAT=0 


then (Array Unshared, u) 
else (Array Shared, u) ; 


ol =a:k; 

Varray = a’ [ol]; 

a” = a'lol 7 (Varray U Varray)3 
AP Tf] = {(N, 01, °7°, Up, 7,0) fs 


in ( fol},o”,A7, A?) } 


The only change to the Fetch evaluation clause is to make it fetch components from abstract arrays 
annotated with sharing information. 


Ea *Fetch(se1, se) ]pac® = 


{ ls =SEy,| sei |p; 
(4rray $0) = LI alo); 
ol€ls 


in ( v,0,0, LpmapP) } 


No change is needed in the clause for the Bounds primitive. 
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7.2.3 Reexamining the Array Examples 


The first example we considered in Section 7.1.3 had no sharing in it. The example is shown below 
for reference. 


def gi (i, x, y) = 
‘oMakeTuple(x,y,i); 


def fi (n, x, y) = 
fas MakeArray yi (n, x; y); 
in 3 }; 


Our new abstract interpretation should discover that there can be no sharing in this example. The 
arguments to the call to MakeArray are: 


n= 


VV, = 


Iz |= |= 


Vy = 


assuming that variables x and y are bound to integers. Thus, there are no locations reachable from 
the arguments to the call to MakeArray. 
I=9 


The result of the call to gi is a reference to a tuple in location [p, so the set of reachable locations 
is {Io}: 

R= {lo} 
and the intersection of the inherited locations and the reachable locations is the empty set. There- 
fore, there can be no sharing between elements of the array. There is no way, without side effects, 
that different elements of the array could end up sharing the same location. 


We end up computing the following representation for the value of a: 


pla] = {hh} 
oll] = (Array Unshared, {lo}) 
o[lo] = ( Tuple N, N, N) 


That is, a is bound to an array labeled /,, containing a tuple or tuples labeled [py as its elements, 
and the tuples in different elements of the array are guaranteed to be distinct. 


If we determine that the above array and its components are dead in some context, then we can 
insert code that deallocates the array and all of its components without having to insert any run- 
time code to detect sharing among the components. 


The second example from Section 7.1.3, which created an array containing shared elements, follows: 
def £2 (n, x, y) = 
{te ‘oMakeTuple(x,y,4) ; 
a= MakeArray 42 (n, t); 
in 3 }; 


def g2 (i, t) = 1; 
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In this case, the arguments to the MakeArray are: 
p[n] = N 
ple] = {lo} 


Therefore, the inherited locations are the set {Io}: 


T= {lo} 
The application of procedure g1 yields a reference to location Jo, so the set of reachable locations 
Ris: 

R= {lo} 


Since the intersection of J and R is non-empty, the array representation is shared. 
a= {i} 


oly] = (Array Shared, {lo}) 
( Tuple N, N, N) 


Q 

at 
= 

<a 
l| 


The variable a is bound to an array labeled /; containing a tuple or tuples labeled /o, where there 
is some sharing among the elements of the array. 


In this example, we can still insert code to deallocate the array and its components if we determine 
that they are dead in some context, but we have to insert run-time code to detect sharing. 


7.3. Modeling I-Structures 


This section extends the instrumented and abstracted interpreters with I-structure array data types 
primitives. Although Id has both [structure algebraic types and arrays, we discuss only I-structure 
arrays — the implementation of other I-structure types follows directly from the model of I-structure 
arrays. We use the array value domain, but add two new array operators: MakeIArray and Store 
to KID~. The first subsection presents the standard semantics of I-structure arrays, and the second 
subsection presents the abstract semantics extended with I-structure array operators. 


In KID~, I-structures are created using the primitive MakeIArray with all slots empty, or bound 
to L. Elements of the array may be filled in using the primitive Store and dereferenced using the 
primitive Fetch. It is an error to store more than one value into a a single I-structure array slot, 
although we do not check this in the interpreter. 


7.3.1 I-Structures in the Instrumented Interpreter 


This section defines the interpreter clauses for the primitives MakeIArray and Store. The primitive 
MakelArray takes one value: a simple expression that evaluates to length n. It returns a newly 
allocated array of length n where each component is initially unbound. 


€;[ *MakeTArray (se) ]pao = 
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{ol =a:k; 
n =SE| se |p; 
Yo =4; 
Vn-1 = +; 
ol = olol = (Array R,Vo,** +, Un—1)]} 


in (ol,o’,{( a, ol )},0,0) } 


The primitive Store takes three values: a reference o/ to an I-structure array a, an index 2, anda 
value v, and returns the reference to the array and a store that has the 7th component of a bound to 
v. The interpreter records the fact that a side-effect was done on the object labeled of by returning 
a reference event for ol in activation a. 


Er[ kstore (s€1, S€2, 5€3) J pao = 
{ ol = SE] sei Ip; 


t =SE| seq |p; 

vp = SET sez |p; 

(Array 1, VO,°° +, Vist Un—-1) = olol]; 

a! = alol > (Array Ms V0,+°°, VEU UL ++, Un—1) |} 


in (ol, 0’, 0,0, {(a, ol )}) } 


Now that we have seen the definition of the instrumented interpreter clauses for handling I- 
structures, we can go on the definition of the abstracted [structure domains and the abstracted 
I-structure interpreter clauses. 


7.3.2 I-Structures in the Abstract Interpreter 


In the abstracted interpreter, we use the array domains with sharing information. Whenever we 
store into an I-structure array, we make that array be a shared array. 


The following two clauses give the abstracted evaluation rules for I-structure array data structure 
primitives. The primitive MakelArray constructs an abstract array with no sharing whose compo- 
nents are undefined. The primitive Store updates the component of the array to the least upper 
bound of the current array element and the new value. Storing into an I-structure array may 
potentially introduce sharing, so we upgrade the sharing indicator of the array to Shared when a 
Store is performed. 


E4[ ‘MakeTArray (se,) ]poa® = 
{ Varray = (Array Unshared, 1) ; 


ol =a:l; 
Vorray = alol]; 
a’ = a[oal 7 (Varray U Varray)!} 


in ( {ol}, 07,0, Lpmap ) } 
E4[ kstore (s€1, 5€2, s€3) | pao® = 
{is =SE,4| ser |p; 
v =SEx,| sez Jp; 
a! = dol olol] U (Array Shared,v) if ol € ls 
‘| ofol] otherwise 
in ( Is,o',0, Lpmap ) } 
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7.3.3 Effect Of I-Structures on Deallocation Safety Conditions 


The introduction of I-structures into KID~ has introduced a new mechanism by which objects can 
escape from a given activation. Previously, objects could be passed into an activation through the 
environment as inherited values or arguments, or they could be passed out of the activation as part 
of the results. Now, an I-structure with empty slots can be passed into an activation, and objects 
allocated within the activation can be stored into the empty slots and escape from the activation. 
Thus it is now possible for objects allocated within an activation to escape via the inherited objects. 


However, this new path for escaping objects does not significantly change the criteria that we use 
to decide that a particular activation contains an object’s lifetime. We now must determine that 
an object is not reachable from the result of an expression or from the objects inherited from the 
surrounding environment after the expression has executed. The only change in the tests is which 
store is used to determine reachability from inherited objects. Previously, we used the incoming 
store to determine reachability; now we must use outgoing store to determine reachability. 


Let us again consider the canonical letrec block with deallocation commands that we would like 
to verify: 
e= {21 =e; 


tn = Ens 


Dealloc(y,) 


Dealloc (Yn) 

in a; } 
where the environment, store, and function environment in which e is to be evaluated are p, a, and 
®, respectively. 


We compute environment p’ and store o’, the resulting environment and store for the block bindings, 
A~, the set of labels deallocated by the block bindings, and v, which is the result of the evaluation 
of the expression, as shown below. In addition, we compute R, the set of object labels reachable 
from the result of the expression, and J, the set of object labels reachable from the free variables 
of the expression. 


po = p[L/ai,-++,4/%] 
(p',o', A, AP) = EvalBindings, (| e1 = €13---3%n = en |, ®,p0,0, LpmapP) 
v p [x5] 
R = Reachable (v,0’) 
I 


UJ Reachable (p'[y], 0’) 
yEeFV (e) 


Previously, J was computed with respect to o, the incoming store, and now it is computed with 
respect to o’, the result store. 


7.3.4 Example I-Structure Program 


We will now execute an example I-structure program to see how lifetime analysis performs in the 
presence of side-effects. In the following I-structure program, procedure fo allocates an empty 
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I-structure and passes it to procedure g, which fills it in with two tuples. 


def g (a) = 
kof t1 = “MakeTuple(6.823, 6.847); 
t2 = ?MakeTuple(6.847, 6.823); 
xi = Store(a, 0, t1); 
x2 = Store(a, 1, t1); 
x3 = Store(a, 2, t2); 
in True /; 


def fp © = 
kof a = ‘oMakeTArray (3); 
v = "Ig(a); 
r = Fetch(a, 0); 
in r }; 


The tuples allocated in procedure g and bound to variables t1 and t2 are not returned as part of 
g’s result, yet they escape from the body of g. They are stored into the J-structure passed in as g’s 


argument. 


The result yielded by executing this program under the instrumented interpreter would be: 


( €.ko.ky.ke : ly, 
€.ko : lo => (Array 3, €.ko.ky ke : 1, €.ko.ky.ko : 11, €.ko.ky.ko : PY) 5 
€.ko.ky.kg 2 ly > (Tuple 6.847, 6.823) ’ 
€.ko.ky.kg ly > (Tuple 6.847, 6.823) ’ 

( €.ko : lo, €.ko ) 5 ( €.ko.ky.ke : ly, €.ko.ky.ke ) 5 ( €.ko.ky.ke : Io, €.ko.ky.ke yt, 


b) 


{ 
0, 
{( €.ko : lo, €.ko ) 5 ( €.ko : lo, €.ko.ky.ke )}) 


The abstract interpreter, using our improved activation labels, would yield the following: 


( {e.ko.ky ke : ly, €.ko.ky.ke : Ig}, 
€.kg t lo > (Array Shared, {€.ko.ky.k2 +l, €.ko.ky.k2 2 lo}), 
€.ko.ky.ke : ly => (Tuple N,N) 5 
€.ko.ky.ke : Is => (Tuple N,N) 5 


Lpmap ) 


We lose some information in the abstract domain, because it appears that both tuples escape from 
the result of procedure fp. This approximation is safe, because nothing that is reachable under 
the instrumented semantics appears unreachable under the abstracted semantics. Lifetime analysis 
using the abstract interpreter correctly determines that the two tuples may escape from the body 
of procedure g, even though neither of them is directly returned as part of g’s result. 


7.4 Summary 


This chapter described the abstraction of the array domains and how we have to use spatial sum- 
marization to obtain a finite representation of arrays at compile-time. We used this abstract array 
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domain to extend our lifetime analysis algorithm to handle programs containing arrays. We then 
extended the abstracted array domains and abstract interpreter in order to perform sharing analysis 
on array elements, because the compiler can generate more efficient deallocation code if it knows 
that no element of an array is shared. 


We also added [structure arrays to KID~ in this chapter. [-structures increase the expressiveness 
of the language and allow us to write some programs more efficiently than if we had to use the 
functional MakeArray construct. I-structures also introduce a new path for objects to escape from 
a control region — objects may escape by being stored into I-structures that were inherited from 
the surrounding context. We showed that our lifetime analysis algorithm correctly handles this 
case. 
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Chapter 8 


Algebraic and Recursive Types 


In Chapters 4 and 5 we saw how to summarize the behavior of KID~ over tuples, numbers, and 
booleans. In Chapter 7 we introduced the notion of spatial summarization which was necessary 
to generalize the values of arrays. Spatial summarization was introduced because, in general, the 
size of an array can only be determined at run-time, and we needed to be able to summarize the 
behavior of a program over all possible arrays. 


In this chapter, we develop an abstraction of algebraic types. The abstraction of non-recursive 
algebraic types is very straightforward. This abstraction is discussed in the first section of this 
chapter. 


The abstraction of recursive algebraic types in many abstract interpreters is very difficult because 
the size of the representation of a recursively typed object can grow without bound. We see in 
the second section of this chapter; however, that our abstract interpreter does not suffer from this 
problem. 


Our abstraction of non-recursive algebraic types is adequate for recursive algebraic types as well. 
This abstraction involves a form of spatial summarization because the number of nodes composing 
an object of recursive type can only be known at run-time, but our abstraction compresses it into 
a finite number of nodes at compile-time. 


Although our abstraction of algebraic types is general enough to model any recursive algebraic 
type safely, the only recursive type for which our implementation of the deallocation code insertion 
algorithm can generate deallocation code is lists. We discuss our abstraction of lists in the third 
section of this chapter and compare our abstraction with that of other researchers. 


The spatial summarization that occurs in the abstraction of recursive types makes it difficult to 
insert code to deallocate these objects because there may be sharing between the nodes of the 
objects. We need a better idea about the sharing that occurs between the elements of a recursively 
typed object. We discuss a way to approach this problem in Section 11.1.3. 


8.1 Abstraction of Algebraic Types 


In Chapter 2 we saw that oneofs, or algebraic types, are represented by tagged structures in the 
standard interpreter. The tags distinguish instances of the different disjuncts of the algebraic type. 
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Evaluation of a given expression in different contexts may return different disjuncts of an algebraic 
type. For this reason, an abstract oneof value must be a product of each of the possible disjuncts, 
rather than a sum, as in the instrumented interpreter. The abstracted value must capture informa- 
tion about the values resulting from evaluation in all possible contexts. For instance, an expression 
that results in an object of type transaction, where: 


type transaction = deposit I | withdrawal I 


might return either a reference to a deposit of 19.92, represented by: 
(9,2 19.92) 
or a withdrawal of 353.0, represented by: 


(1,5 353.0). 


The abstract interpreter must represent both possibilities in a single value. This expression would 
return a reference to the following abstract oneof: 


(Oneof (o N), (1 N)) 


which represents either a oneof with tag 0 (a deposit) or a oneof with tag 1 (a withdrawal). 


The above abstract transaction value is the most defined abstract transaction value. This abstract 
value represents standard values that are either deposits or withdrawals. We can also represent 
transactions that could only be deposits as follows: 


(Oneot (ao N), L) 


We represent transactions that can only be withdrawals as follows: 


(Oneo¢ £, (1 N)) 


Either or both of the components of an abstract transaction structure can be bottom. If an 
expression e evaluated in some context C’ under the abstract interpreter yields a transaction 
structure with bottom for the deposit component, then the same expression could never yield a 


deposit structure if it was evaluated under the standard interpreter in a context compatible to C. 


The following are the abstraction functions that map standard transactions into abstract transac- 
tions: 


ABS Transaction ((o,2 n)) = (Oneof (6 N) 9 1) 
ABS Transaction ((1,2 n)) (Oneof 4, (1 N)) 


This method of summarizing information about algebraic types is very general. As we shall see in 
Section 8.2, it even handles recursive types appropriately. 
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Abs Oneof ((tag,n VOs** "5 Um)) = 
(Oneof 4, a) (tag Absy (vo), _ -ABSy (Ym )) ar) 1) 


Figure 8.1: Oneof abstraction operator 


(Oneof do, a) d,,) LU Oneof (Oneof do, a) d;) = 
aoe t aoe t 
n 
(o eof (do UDpisjunet do), (d, UDpisjunet d,,) 
(; VO," "5 Um) UDpisjunet (i U0," *, Um) = 


(i (vo Uv Uo),+++, (Um Uv Um) 


Figure 8.2: Oneof least upper bound operators 


(Oneof do, a) dy.) L Oneof (Oneof do, a) d,,) \\ d; C Disjunct di 


t 
/|\ vo Ly uo 
i 


(; VO,"* -,Um) EDisjunct (i U0,°° -, Um) 


Figure 8.3: Oneof ordering operators 


8.1.1 Domains for Abstract Algebraic Types 


We add the following definitions of the abstract Disjunct and Oneof domains and revise the storable 
value domain SV as shown: 


Disjunct = (wV,---,V), 
Oneof = (Oneof Disjunct,--+, Disjunct) 
SV = (Tuple + Array + Oneof), 


Each value in the Disjunct domain is either a tagged tuple of denotable values or bottom. Each 
value in the Oneof domain is a tuple of Disjunct’s, and storable values (SV) are either tuples, 
arrays, oneofs, or bottom. Stores still map abstract object labels to storable values. 


Figure 8.1 contains the function Absoneo¢, which maps standard oneof values into abstract oneof 
values. Figure 8.2 contains the least upper bound operator on abstract oneofs, and Figure 8.3 
contains the ordering operator for abstract oneofs. 


8.1.2 Abstract Interpretation of Algebraic Types 


Figure 8.4 contains the clauses of the abstract interpreter that evaluate the primitives that cre- 
ate and manipulate oneof objects. These clauses are similar to the ones from the instrumented 
interpreter, except that they manipulate abstract oneof values. 
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E4[ ‘MakeOneof tay,nrags (s€1,°++,5€m) J pra® = 
{vm  =SEal ser Ip; 
Un; = SE | sen, |p; 
ol =a:l; 
do = L; 
diag = (tag V1,°°+, Um) 3 
Aneags = 1; 
Voneof = ( Oneof do, a) Anrags-1) ; 
a! = o[ol > (Voneo¢ U ofol])]; 


in ( {ol},0',0, Lpmap ) } 
E4| Istag? (se) ]poa® = 
( b,0,0, LpMap ) 
E4 | Selectiag; (se) Jpoa® = 
{is =SE,4| se ]p; 


( Oneof do, a) diag—15 diag diag+15 a) Anrags-1) = 
| | olol]; 
ol€ls 

(tag V1,°* "5 Um) = ditagi 


in ( %,0,9, _pMay ) } 


Figure 8.4: Abstract interpretation of algebraic type primitives 


8.2 Abstraction of Recursive Types 


Recursive types are a special case of algebraic types. Even though the individual nodes of a 
recursively typed object are of fixed size, the object can have a size that is unbounded. In the 
abstract interpreter, we need some form of spatial summarization that collapses a list or tree object 
of potentially unbounded size into a representation with bounded size. As we see later in this 
section, the spatial summarization of recursive types naturally follows from our abstraction of 
locations and stores. 


Consider the definition of copy_list, shown below. This procedure takes a list and returns a copy 
of the list. 
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def copy_list(1) = 


if Nil?(1) 
then 'Nil 
else { a = Hd(1); 
as = T1(1); 


1? = 'copy_list(as) ; 
bs "MakeCons(a, 1’); 
in bs }; 


The result of a call to copy_list under the standard interpreter is a list whose length is the same 
as the length of the input list 1. 


Abstract interpreters that do not use a store or retrieval function to model structures have difficulty 
abstracting recursive types. Under these interpreters, the result from a call to copy_list would be 
a potentially infinite representation of a list because all procedure calls and both branches of all 
conditionals are evaluated. Consequently, the abstract interpretation would not terminate unless 
some action was taken to bound the size of the representation of the list. 


There are three ways we can bound the sizes of the representations of recursive types in abstract 
interpretation. First, we can compress the domain a priori, as we did with the integer and boolean 
domains. Second, we can apply a generalization, or summarization, operator to such representa- 
tions. Third, we can structure our domains and interpreter so that we can guarantee that no values 
of unbounded size are ever constructed. 


8.2.1 Abstraction of Recursive Types by Domain Compression 


Much of the functional language community has taken the first approach to abstracting recursive 
types. For instance, Wadler compresses the abstract list domain into the following four elements 
for strictness analysis [42]: 


Te —any finite list, no member of which is L 

te — any finite list, some member of which is L 

oo — any infinite list or approximation to one, except L 
tL —L 


This list domain ensures that the abstract interpretation terminates in a finite amount of time, 
because all list objects have fixed size. 


The list domain defined by Wadler can only capture information about uniform lists. It cannot 
capture information about lists that may begin with a finite sequence of cons cells with non-uniform 
properties followed by a uniform list. Furthermore, it is difficult to see how to define appropriate 
abstract domains for other algebraic types based on this abstraction of lists. 


8.2.2. Abstraction of Recursive Types by Ad Hoc Object Compression 


The second approach to limiting the size of the abstract representation of a recursively typed object 
is to apply a compression operator to the representation: the operator generalizes the abstract value 
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Figure 8.5: Abstract list value before and after compression 


representation and limits the number of nodes in the representation to some arbitrary bound k. 
Figure 8.5 shows a representation of a list (a) before, and (b) after compression. The compressed 
abstract list contains only two nodes. 


The drawback to this approach is that it is difficult to choose the bound & on the object represen- 
tation’s size that yields the best information for a particular program. In some cases the value of k 
may be too large, resulting in extra overhead during analysis but not providing more information. 
In other cases, the value of & is too small and useful information is obscured. 


8.2.3 Abstraction of Recursive Types by Object Label Compression 


The third approach, and the approach taken in this thesis, is to structure the domains and in- 
terpreter in order to guarantee the finiteness of the representation of any list or algebraic type. 
The use of stores (or other retrieval functions) that map a finite number of abstract object labels 
to abstract storable values guarantees that the size of a recursively typed object representation 
remains finite. All nodes with identical labels are coallesced into a single node. Thus, there will 
only be a finite number of distinct nodes in the representation of any object or group of objects. 


We defined our abstract activation labels, object labels, denotable values, storable values and stores 
so that we could analyze programs containing only tuples. We then augmented the storable value 
domains so that we could analyze arrays and non-recursive algebraic types. With this abstraction 
we can also analyze programs that use recursive types because the number of distinct abstract 
objects in a program is bounded by the size of the object label domain. The abstract object label 
domain is bounded in size by the number of paths through the call-graph of a program (disregarding 
cycles). 


8.2.4 Spatial Summarization in Recursively Typed Objects 


The abstraction of recursively typed objects involves a form of spatial summarization, as did the 
abstraction of arrays. In arrays, we summarized a single object whose size was known only at run 
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time by an abstract object by a single component. In recursive types, we summarize a concrete 
eraph or tree containing an unknown number of nodes by a graph with a fixed number of abstract 
nodes. Any two nodes in the concrete graph whose object labels map to the same abstract object 
label will be summarized by a single abstract object. 


For example, consider the type tree: 


type tree = node tree tree | leaf N; 


and the following value and store that represent a concrete tree object: 


aslo, 
atlo (9,2 €.ky 2 lo, ek: + lo), 
a.ky ilo + (9,9 €ky.ky th eky.ko ih), 
akg tlo + (9,9 €.ko.ky 2h, €.ko.kg 2 lo), 
a.ky.ke : lo => (0,2 €.kg.k9.ky : L,, €.kg.ko.k9 : lo), 
a.ky.ky dy => (1,2 1) 
a.ky.ke : ly => (1,2 2), 
Oke. ky dy => (1,2 3), 
Oke.ky.ky dy => (1,2 4), 
a.ke.ko. ke dy => (1,2 5) 


This tree consists of 4 interior nodes and 5 leaf nodes. 


If we abstract the activation labels appearing in this representation to the following set: 
{a.(hio + ki)"}, 
then the abstract tree representation collapses to the following 2-node abstract value and store: 


{a.(ko + ki)* : lo}, 
a.(ko + ky)* : lo => 


a.(ko T ky)* : lo, a.(ko Tr ki) : lo, L 
Oneof 0 a.(ko + ky)* : ly , a.(ko Tr ky)* : ly 
a.(ko + ky)* : hy > (Oneof 1, (1 N)) 


This representation consists of only two abstract nodes. The abstraction of activation labels is 
normally derived from the call-graph of a program. 


We do not add anything to the abstract domains or the abstract interpreter in this section. The 
complexity we added in the basic framework has paid off by being general enough to handle recur- 
sively typed objects. In the following section, we describe an extension to the abstract interpreter 
to model lists as a special case of Oneofs. 


8.3. Abstraction of Lists in KID™ 


The list type, which is a particular recursive algebraic type, could be modeled using our Oneof 
domain. However, our implementation of the deallocation command insertion algorithm generates 
specialized code to deallocate lists, so we model lists separately in our abstract interpreter. 
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ADbSrist ((Cons VO; v1)) = (List (Cons Absy (vo), ABSy (v1)) ; 1) 
Absrist((wit)) = (rise L, (vit )) 


Figure 8.6: List abstraction operator 


(List Co, No) Uist (List C1, 4) = (List (co UCons ¢1), (no Unit n1)) 
(Cons Vo, V1) UCons (Cons Wo, W1) = (Cons (v0 Uy wo), (% Uy w1)) 
(wit) Una (wit) = Cit ) 


Figure 8.7: List least upper bound operators 


(co Eons C1) A (mo Eni 21) 
(v Ey wo) A (v1 Cy wi) 


(wit) Eni (wit) = True 


(nist Co, 0) Enist (List C1541) 


(Cons vo, 01) Eons (Cons Wo, W1) 


Figure 8.8: List ordering operators 


8.3.1 Abstract List Domains 


The definition of the abstracted list domain that we use follows: 


Vist € List = (nist (Cons V, Ls), (wil ) 1) Lists 
sve SV = (Tuple + Array + Oneof + List); Storable Values 


Abstract lists, like abstract oneofs, are represented by a pair of tagged disjuncts. If one of these 
components is bottom, that indicates that none of the concrete values represented by this abstract 
list could evaluate to Cons or Nil. If both of these are non-bottom, then the corresponding standard 
values could be either Cons or Nil. 


Figure 8.6 contains the function Abs;;,¢, which maps standard list values into abstract list values. 
Figure 8.7 contains the least upper bound operator on abstract lists, and Figure 8.8 contains the 
ordering operator for abstract lists. 


This list abstraction is safe, in that it preserves the reachability of the list elements. Abstract 
list representations may suffer from spatial summarization and lose information about whether the 
cons cells are shared. If a list is constructed from distinct invocations of Cons, then the analysis 
will obtain complete information (as with tuples). However, if a list is constructed by a recursive 
procedure, then the calls to Cons will not have distinct activation labels, and a cyclic representation 
will be constructed. 


The compiler must assume that any list whose abstract representation is cyclic may represent a 
cyclic list. The compiler must also assume that the objects pointed to by a cyclic abstract list may 
be shared. Therefore, if the compiler inserts code to deallocate a list, it must ensure that each 
distinct cons cell in the list is deallocated only once. 
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E4[ ‘Cons (sey, 5€2) ]poa® = {vy =SEy| sei Jp; 
v2 SEx4| seo Ip; 
Vist = (List (Cons v1, v2) 9 1); 
odo =a:l; 


Vist = a(ol]; 
a § =ofol = (vist U Ujise)]s 
n ( {ol}, 0',0, Lpmap ) } 
Ea Hd(se) ]poa® = {ls=SEy,] se ]p; 
(List (Cons V1, V2) , (w. Nil )) = 
(List (Cons 1, L) , 1) U L| ofol]; 
in (v,0,0, Loar) } 
Ea Tl(se) ]poa® = {ls=SEy,] se ]p; 
(List (Cons V1, 02), (Nil )) 
(List (Cons 1,1), 1) 


in ( v2,0,0, Lpmar) } 


ExT 'Ni1() Jpca® = { vnise = (nist L, (ir )) 3 
odo =a:l; 


cll 
> 
oS 


B.? 


Vist = olol]; 
oc = olol => (Vist U hist Ii 
n ( fol},o',0, Lpmap) } 


E4 | Nil? (se) ] pra® ( B,o,0, Lpmapr ) 


Figure 8.9: Abstracted evaluation of list primitives 


8.3.2 Additions to Abstract Interpreter 


Figure 8.9 contains the evaluation rules for list primitives in the abstract interpreter. They are 
similar to the instrumented evaluation rules, except that labels are abstracted and lists are products 
of Cons and Nil objects. 


8.3.3 Representative List Inputs 


To use our deallocation command safety verification algorithm and deallocation command insertion 
algorithm, we must have a method to construct representative input values for objects of recursive 
types. This section defines the clause of procedure CY that constructs representative list arguments. 


The inputs we pass as inputs to procedures under analysis must be of the correct type and must be 
detectable wherever they are passed within the procedure. This constrains the abstract list values 
that we may use as representative inputs to a function expecting a list. We cannot tell a priori how 
many of the cons cells in a list input are dereferenced by a procedure; consequently, the abstract 
list must either have infinitely many cons cells or be circular. We use circular list representations 
as representative inputs. The clause of procedure CV that constructs representative list values, 
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shown below, returns a circular abstract list whose head contains an element of the correct type. 


CV[ List, ] = {(v, oe )=CV[ 7]; 
l = Newloc (); 
a” = o'[l _ (List (Cons v, {U}) , (Nil dys 


in (Uj, 0") } 


The rationale for the safety of using circular lists as inputs is that any particular list under the 
standard interpreter composed of a finite number of cons cells labeled Jp through /,, and containing 
values vp through v, in the heads of each cons can be contained by a cyclic abstract list. We say 
list representation rp contains ry if (r9 C4 bsr,). The abstraction of such a list is contained by the 


following abstract value-store pair: 


lo - (List (Cons v, {lo, a) La}) ; (Nil )) 
( {lo, +++ ln}, L Store : )> 
ln - (List (Cons v, {lo, a) La}) ; (Nil )) 


where # is the least upper bound of the elements of the concrete list: vo through v, and {lo,---,l,} 
is the abstraction of the locations where the concrete list resides. 


Therefore, we can show that the analysis of a function using a circular abstract list representation 
is safe for any list to which the function is actually applied. We show this by substituting the set of 
labels of the Cons cells in the actual list for the label of the Cons in the circular list and substituting 
the least upper bound of the elements of the actual list for the element of the representative list. 


8.3.4 List Examples 


Now let us examine a couple of examples that use lists, in order to see what information we can 
gather and how far we can go with them. Procedure scale_list takes a number and a list of 
numbers and returns a new list of scaled numbers; procedure inc_list takes a number and a list 
of numbers and returns a list of incremented numbers. 


def scale_list (s, 1) = 

if nil?(1) then nil 

else { x = hd(1); 
xs = t1(1); 
x’? =s * X; 
sl = scale_list(s, xs); 
r = Cons(x’, sl); 
in r }; 


def inc_list (delta, 1) = 

if nil?(1) then nil 

else { x = hd(1); 
xs = tl1(1); 
x’? = delta + x; 
sl = “inc_list(s, xs); 
r = 8Cons(x’, sl); 
in r }; 
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The representative input vector for scale_list is 


(N, {i_y}, L Store [i_4 _ (List (Cons N, {l_1}) ’ (nit ))] ) 


This represents a number and a list whose head is a number and whose tail is either nil or the list 
itself. 


The value returned from scale_list is another list, consisting of a cons in location /, or nil in 
location lp: 
Ia = (nist (Cons N, {l-1}), (it )) 
( {lo fa, Lstore | lo > (nist L, (wit )) 
ly > (nist (Cons N, (lo, 41}) ,-£) 


From this value we can determine that the list passed into function scale_list cannot be reached 
from its result. Furthermore, we know that the result is alist that must have been allocated within 
the call to scale_list. 


We obtain similar results when we analyze procedure inc_list. What is the behavior if we compose 
these two functions, as in procedure inc_scale_list? 


def inc_scale_list (delta, s, x) = 
{ x’? = scalelist(s, x); 
r = *inc_list(delta, x’); 
in r }; 


If we evaluate the bindings in the body of inc_scale_list when applied to the following input 
vector: 


( NAN, {l_2}, L Store [l_2 - (List (Cons N, {l_2}) 5 (Nil ))] ) 


we obtain the following values: 


delta + N 

b> N 
p = | «x {lo} 

x! = {lo, &} 

Ta {lz,l3} 

=) - (List (Cons N, {l_2}) ’ (Nil )) lo —~ (List 1, (Nil )) 
' ly > (hist (Cons N, (lo, 4 }), 1) 
ly — (nist L, (it )) 
lg > (List (Cons N, {lo, I }) ’ 1) 
T = {l5} 
R {lo, 13} 


The set I of labels reachable from the body of inc_scale_list contains /_2, and R, the set of labels 
reachable from the result of inc_scale_list’s body contains Jz and /3. From this we can conclude 
that the object to which 2’ is bound, consisting of locations fg and 1;, must have been allocated 
within the body of inc_scale_list and that this object does not escape from there. Consequently, 
we can insert a deallocation command for variable v’. 


What should this deallocation command do? The whole list that has been allocated is garbage, but 
we cannot determine the size of the list from the abstract values. Nor can we determine if there is 
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any sharing in the list. We can insert code to deallocate this whole list as long as the code checks 
that it never deallocates the same cons cell twice along the spine of the list. 


Consider the following example, which constructs a circular list: 


def cyclic_list (elt) = 
{r= "Cons(elt, r); 
in r }; 


Procedure cyclic_list, applied to N, returns the value 


( {lq}, L Store [l4 = (List (Cons N, {l4}) ’ 1)] ) 


Because the nil component of the list in location 4 is bottom, we can conclude that this list never 
has a null tail — it is either infinite or cyclic. However, we still cannot tell how many cons cells 
the list will have under the standard interpreter. 


Whenever we determine that the lifetime of a list is bounded by some control region, we will insert 
code that recursively deallocates all distinct cons cells of the list upon termination of that control 
region. In Chapter 10, we discuss the run-time performance of the code that deallocates potentially 
cyclic lists compared to the code that deallocates acyclic lists. 


Chapter 9 


Higher-Order Functions 


Many modern programming languages have higher-order functions. That is, one can pass pro- 
cedures around as values. Procedures can take procedures as arguments and return procedures 
as values. This ability to pass procedures around provides a great deal of flexibility in writing 
programs. 


Unfortunately, many approaches to lifetime analysis that use abstract interpretation do not model 
higher-order functions. One of the main difficulties in the abstraction of procedure values is how 
to take the least upper bound of two functions. The least upper bound is well-defined theoretically 
as long as the two functions have the same domains and ranges. If we have functions fp and fi, 
then the least upper bound can be defined as a new function: 


fou fi = Ax.( fol) U fi(x)) 


However, this definition is not always conducive to an implementation. The key here is to separate 
out the text of the function from the object being passed around as a value. The approach taken 
in this thesis is to represent functions as closures which consist of the name of a function and the 
values the function is closed over. The name of a function points to the text of the function — its 
definition in the program. We allow a prefix of a function’s arguments to be provided in a closure 
of the function (a partial application), and the rest must be provided when the closure is applied. 


The second major difficulty in the abstraction of procedure values is that the domain of functions 
of a given type is infinite, and so it is no longer possible to enumerate the input-output behaviors 
of a function that takes procedures as arguments over all possible procedures in the domain. We 
limit the domains of functions to contain only closures of functions that are defined in the program. 
There can only be finitely many functions defined in a program, and only finitely many points 
where those functions are closed over values. 


In this chapter, we see how to add higher-order functions to KID~ and to improve the abstraction of 
activation labels. In the first two sections we discuss the implementation of higher-order functions 
in the KID“interpreters. In the final section we present an analysis example using higher-order 
functions. 


9.1 Higher-Order Functions in the Instrumented Interpreter 


In this section, we discuss the changes that need to be made to the domains and interpreters in 
order to add higher-order functions to KID~. We do this for the instrumented and abstracted 
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interpreters. In order to add higher-order functions, we need to add primitives to the language 
to create and apply function values, and we need to add value domains for representing function 
values. 


9.1.1 The Closure Domain 


A higher order function consists of the text of a procedure plus the lexical environment in which 
the function was defined. Higher order functions are most interesting when these functions can be 
defined in lexical environments other than the global environment. Rather than extending KID~ 
with nested function definitions; however, we preserve the flat structure of definitions, and introduce 
a primitive that binds together a particular procedure definition and values from the desired lexical 
environment. We represent functions as closures. Closures are a new kind of storable value. 


cds €Cls = (cis FLV,---,V) Closure 
sve SV Tuple + Array + List + Cls Storable Values 


A closure consists of a tuple of a procedure name f; and n values. If a function f; has r arguments, 
then a closure of f; over n values, where n < r, may be applied to exactly (r — n) values. 


9.1.2. Instrumented Interpretation of Closure Primitives 


We add two primitives to KID~ for creating and manipulating closures: 
MakeClosure,, which closes the procedure named f; over some set of argument values, and Apply, 
which applies a closure to a set of values. We are not supporting currying directly with these 
primitives. The compiler can generate a sequence of intermediate functions that use MakeClosure 
and Apply to implement currying. This is described fully in Hochheiser [21]. 


The primitive MakeClosure is subscripted with the name f; of the function being closed, and takes 
n values over which f; is being closed. MakeClosure is similar to the MakeTuple primitive in that 
the expression label is used to construct a unique label ol of the structure being allocated. Note 
also that the set of allocation events is augmented to show that ol was allocated in the current 
activation a. 


Er[ 'MakeClosurey, (séy,°++,8€,) Jpoa = 
{v1 =SET ser |p; 


vy, = SET sen |p; 
od =a:l; 
foal =olol + (Cis fis Vi,-+ +, Und]; 


in (ol,o’, {( ol, a )},0,0) } 


The primitive Apply takes as its first argument a closure of a function f; with arity r and n values 
over which the function is closed. There must be (r—n) more values supplied to Apply, so that it can 
make a full-arity application of function f;. This primitive is similar to user function applications. 
First, we evaluate the arguments and dereference the closure from the incoming store. Then we 
evaluate the body of the closed function f; in that activation a’ and the proper environment, 
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constructed from the values from the closure and from the inputs to Apply. 


Er “Apply (se, sen4gi,-++,8¢r) Jpoo = 


{ ol =SE| se |p; 
Ont = SET senqi |p; 
Up = SE| se, |p; 
(cls fi,V1,++*5Un) = ool]; 
al =a.k; 
(via, At, A7~, AP) = Er] e; J (pli (21+ + +5 Un/@ns Uns /@nbis +s Vr/ ty] oa’ ; 
AR’ = AP U{(ol, a )}; 
in (v,o’, At, A~, AP’) } 

where f;(a1,---,@,) = e; is a definition in the program 


We return a reference event for the closure object of in activation a. 


9.2 Higher-Order Functions in the Abstract Interpreter 


This section defines the abstract closure domains and the clauses of the abstract interpreter that 
interpret the MakeClosure and Apply primitives. 


9.2.1 Abstracting The Closure Domain 


Abstraction of the closure domain is rather straightforward. We do not attempt to abstract the 
code text of a closure. Rather, we generalize a closure to the set of possible closures that it could be. 
This abstraction fits in nicely with our abstraction of storable values: a reference to the abstraction 
of a closure is a set of abstract object labels, each of which refers to an abstract closure. An 
abstract closure storable value consists of a single function name and a tuple of values over which 
that function is closed. The number of components in the tuple must be less than the number of 
arguments that the function takes. 


veVv = (N+B40Is), Denotable values 
cds €Cls = (cts F,V,---,V) Abstract Closure 
svE SV = Tuple + Array + List + Cls Storable Values 


An application of a reference to a set of abstract closures has to return the least upper bound of 
the values returned by applying each of the abstract closures to the abstract argument values. 


In addition to abstracting the closure domain, we must choose a domain of activation labels. We 
have seen two choices for AFL so far, the simple one from Chapter 4 and the more detailed one 
from Chapter 6. The more detailed abstraction requires the knowledge of the complete call graph 
in order to define a function MAC that takes an abstract activation label and an expression label 
and returns a new abstraction label. We cannot compute the call-graph of a program that uses 
higher-order functions statically, because the names of the functions that will be invoked by an 
application primitive are not known, in general, until run-time. 
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Abscis (( cls fis 1, _ -,Un)) = 
(cls fi Absy (vo), a) ABSy (Yn) 


Figure 9.1: Closure abstraction operator 


For programs with higher-order functions, we use a simpler definition of the AL domain and a 
simple function MAC to compute the next activation label. The definition of the domain, shown 
below, is the same as the AZ domain used in the standard and instrumented interpreters. 


AL = ¢|ALL 


However, the next activation label function guarantees that the set of activation labels remains 
finite. The set of activation labels is finite except for recursive functions; so MAC treats the 
activation labels of recursive function calls specially: 


ak ifa=al.k.G 


a.k otherwise 


NAL(a,k) = 


The motivation for this definition of MAC is that the activation labels of procedures that are 
called recursively will contain repeated expression labels. Under the standard interpreter, the next 
activation label from a’.k.3 given expression label & would be: 


a= al.k.B.k 
so the function MAC limits this to one occurrence of k: 
a=alk 


by eliding the sequence of expression labels: /.3. Note that 3 is empty for singly recursive functions, 
and it is non-empty for functions that contain multiple recursive calls or for groups of mutually 
recursive functions. 


An alternative definition of MAC which may be more desirable because it further restricts the size 
of the activation label domain is defined below. 


NALC(a,k) =k 


This definition may yield detailed enough activation labels for most purposes. 


Of course, one could use the original definition of abstract activation labels from Chapter 4, which 
corresponds to the following definition of MAC. 


NAL(a,k) = € 


This definition yields the smallest possible domain of activation labels. 


Figure 9.1 contains the function Absc;,, which maps standard closure values into abstract closure 
values. Figure 9.2 contains the least upper bound operator on abstract closures, and Figure 9.3 
contains the ordering operator for abstract closures. 
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(Cis f,01,0++5 Un) U cts (Cis 9, W1, °° +5 Wn) = 
(ols f,(v1 Uv w1),-++, (tn Uv wn)) if f=g 
T otherwise 


Figure 9.2: Closure least upper bound operators 


(cr fovtss++s Ong) Cots (cls J, 15° +5 Wng) = 
ING: Cy w;) if f=gandns =n, 
a 
T otherwise 


Figure 9.3: Closure ordering operators 


9.2.2 Termination of Abstract Interpretation 


The KID“ type system guarantees that all closures that are created are of finite depth, where the 
maximum depth can be fixed at compile-time. This fact, plus the fact that there are only a fixed 
number of procedure texts and MakeClosure expressions in a given program, guarantees that there 
can be only a finite number of possible values for any given abstract closure arising during the 
abstract interpretation of a program. Thus, abstract interpretation of a program still takes a finite 
number of iterations to compute the function environment. 


9.2.3. Abstract Interpretation of Closure Primitives 


We also add the clauses to the abstract interpreter for the two primitives that create and manipulate 
closures: MakeClosure,;, which closes the procedure named f; over some set of argument values, and 
Apply, which applies a closure to a set of values. The clause for MakeClosure, shown in Figure 9.4, 
uses the static expression label / alone as the object label of the allocated closure. 


The clause for Apply, shown in Figure 9.5, first interprets the first argument to yield a set /s of 
references to abstract closures. The result is the least upper bound of the result of invoking each 
of these closures with the arguments supplied to Apply as well as the values carried in the closures. 
Each abstract closure is invoked by determining the function name f; and the argument values, 
and then looking up the entry for f; and those values in the function environment ®. In addition, 
these values are added to the domain map AP’ for function fi. 


9.2.4 Analysis of Higher-Order Programs 


Now that we have seen how to extend the abstract domains and the abstract interpreter in order 
to handle higher-order functions, let us see how this affects analysis of programs containing higher- 
order functions. There are a number of ways this can affect the analysis and transformation of 
such programs. It can cause loss of information, because we have less idea what computation will 
be performed by an expression. It can also cause added complexity in the analysis, because it 
is harder to construct representative input values. But, by exposing a higher-order function as a 
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E4[ ‘MakeClosurey, (séy,-++,S€,) J poa® = 
{v =S€,| sei |p; 


Vv, =SEax4| sen |p; 


Vels = (Cls fis, _ *,Un)3 
od =a:l; 
ao’ =olol = of[ol]U ves]; 


in ( {ol},o’,0, Lpmap) } 


Figure 9.4: Abstract evaluation of the closure constructor 


Ea| “Apply (se, sengi,++-,8€r) |poa® = 


{ ls = SE, se |p; 
Un+1 =SEa4| sen41 |p; 
Vp =SEx4[ se, ] p; 
al = NALC(a) k; 


( v',a!, AW", AP’) = L| { (cls fis P15" . *, Un) = o[oll]; 
ol€ls ( v', a’, AW ) = 
, ®[fi][(O1, +++, Uns Ungis tts Urs 7, a) ]; 
AP [fi] = {(01.0 ++. Uns Unga. +s Ur, 7,0); 
in (v',o’, AW", AP’) } 


Figure 9.5: Abstract evaluation of closure application 


closure — a data structure — we have enabled the compiler to perform storage management on 
closures themselves. 


We may lose information about the lifetime of an object created within a procedure if it is passed to 
a higher-order function because we may have to make worst case assumptions about the behavior 
of the function passed as an argument. 


In the algorithms described in Chapter 5, we began computation of the function environment by 
computing the value of the application of each function in the program to a representative input 
value. What representative values should we use for functions which take closures as input? What 
values should we use when analyzing the body of a function and verifying or inserting deallocation 
commands? 


During the analysis of a function body, we really do want to pass in some function value that 
captures the behavior of any function that could be passed in at run-time. Either we use the least 
upper bound of all possible values that arise under abstract interpretation, or we make a worst-case 
assumption about the behavior of the function. This value must satisfy the constraints of the type 
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system, but all input values could conceivably be carried in the result of the application. Even 
worse, if I-structures or other side-effecting operations are supported, all input values (of the right 
type) could be side-effected or stored off in some structure reachable from the input values. 


The approach of constructing representative input values for higher-order inputs to functions only 
pays off it it allows us to manage the storage of closures. Otherwise, we may as well use the least 
upper bound of all possible values that could be passed as input to this function. 


If we look at the whole program, then we can actually determine the types of all the closures created 
in the program (assuming monomorphic typing), and use the set of all closures of the correct type 
as the input value to a function that takes closures as arguments. This process may be equivalent 
to taking the least upper bound of all possible inputs to a function that arise in the abstract 
interpretation described above, and analyzing the function when applied to this least upper bound. 
This process is similar to the behavior of collecting interpreters [24, 44]. 


It seems that it is better to use the most general function value that could ever be passed as input 
to a procedure during the analysis of that procedure than to construct representative closure values. 
We are likely to lose too much information if we use worst-case representative closure values rather 
than the closure values that arise during abstract interpretation. 


9.3. Example of Abstract Interpretation of Higher-Order Func- 
tions 


Let us consider the abstract interpretation of the following program. In the main procedure fg, one 
of two higher-order functions is called depending on the value of predicate p. What is the behavior 
of this program under the abstract interpreter? 


{ 
def fo © = 
{ p = e0; 
f= if p 


then loMakeClosure so.(10) 
else /'MakeClosure,,,(True, 3); 


z= el; 
r = Apply(f, z) 
in r }; 


def foo (n,m) = 
’o>MakeTuple (n,n+m); 


def bar (b,n,m) = 
{ x = if b then 3 else 4; 
t= 'sMakeTuple(x,n-m) ; 
in t }; 
} 


We would like to know what the result of the invocation of function fp is under the abstract 
interpreter. The function, or closure, to which variable f is bound is dependent on the value of 
variable p, a run-time value. Therefore, we must abstract the behavior of f over all executions. 
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If we evaluate the bindings of the Lletrec block in the body of fo, we get the following environment 
and store: 


p> B 
po f 7 {lo, bi} 
Pp = Ln» yoo N 
r => {lz,l3} 
lo — (cls foo,.N) 
i, (cl bar, B N) 
oa! =~ | ore 1 Fa a 
st ) - ( Tuple N, N) 
Is = ( Tuple N, N) 
AT = 9 


We can see by examining p’ and o’ that f can be bound to a value which is either a closure of foo 
over a number or a closure of bar over a boolean and a number. In order to obtain the value of 
variable r, the interpreter had to evaluate the application of foo applied to two numbers and the 
application of bar applied to a boolean and two numbers. 


Chapter 10 


Performance Analysis 


This chapter discusses the performance of an implementation of the analysis and transformations 
described in this thesis. The first section discusses our implementation of the verification and 
insertion algorithms, and Monsoon [36], the machine on which we ran our benchmarks. The second 
section presents the experiments themselves. The third section presents an optimization that 
generates code to allocate structures in procedure activation frames whenever possible and discusses 
how this affects the run-time performance of programs. The fourth section describes the difficulty of 
deallocating structures that may pass through zero-tripping loops, loops that execute zero or more 
times. The fourth section also describes a code generation strategy that can solve this problem. The 
fifth section describes an optimization that hoists matched allocation and deallocation commands 
out of loops in order to reduce the run-time overhead of storage management. 


10.1 Implementation Details 


Most of the theory developed in this thesis has actually been put into practice. We have an im- 
plementation of the abstract interpreter, the deallocation command verification algorithm, and the 
deallocation command insertion algorithm. This section describes the details of our implementation 
and the structure of the experiments we used to determine the overall effectiveness of our methods. 


10.1.1 Implementation of the Verification and Insertion Algorithms 


Our implementation of the deallocation command verification and insertion algorithms handles 
tuples, arrays, algebraic types, lists, and L-structures as well as a number of scalar types: booleans, 
integers, floating point numbers, characters, and symbols. The implementation uses activation 
labels similar to those described in Chapter 6, but higher-order functions are not supported. The 
implementation handles conditionals, loops, and the limited form of barriers shown in this thesis. 
The current implementation of the compiler does not insert conditional deallocation commands, 
but it does attempt to get complete coverage of deallocatable structures using a greedy algorithm 
and a careful ordering of the identifiers whose values may be deallocated. 


The deallocation command verification and insertion tools are implemented as two new modules 
in the Id Compiler [40]. Both modules operate on program graphs, which are basically a dataflow 
eraph representation of KID~. The first module computes the function environment for the whole 


161 


162 CHAPTER 10. PERFORMANCE ANALYSIS 


program and verifies and annotates each function definition. The second module walks over the 
program again, and actually inserts deallocation commands and barriers where the first program 
annotated the graph. 


The compiler uses the behavior of each function over its representative input as the behavior of 
that function over any input, as described in Section 5.2. The compiler must compute the behavior 
of all mutually recursive procedures together, but in general computes the entries in the function 
environment in an order determined by a topological sort of the recursive-set nodes in the program. 
A recursive-set node consists of either a function alone, for non-recursive functions, or a function and 
all of the functions it calls recursively, for recursive functions. This allows the compiler to compute 
the function environment for each function f; before all non-recursive calls to f;. Computation 
of input-output mappings for each recursive-set in topological order also speeds up analysis by 
making the function environment converge faster. The analysis module takes time proportional to 
the number of recursive-sets and time quadratic in the size of the recursive sets. 


In more detail, the first module computes the call graph of the program. From the call graph, the 
compiler determines the recursive-sets of the program and the order in which function environment 
entries must be computed. The compiler then generates the representative inputs and computes 
the function environment in topological order. 


Next, the compiler visits each procedure and applies first the deallocation command verification 
algorithm and then the insertion algorithm. Any time a potentially unsafe deallocation command 
is found, the compiler issues a warning with as much identification information as possible. 


The insertion algorithm works on one control region at a time. Control regions in the program 
graph correspond to the bodies of procedures, the branches of conditional and case expressions, 
and the code before a barrier. Each of these regions must have been a letrec block in the original 
KID~ code. 


Within each control region, the compiler determines all of the output ports (which correspond to 
the definition of an identifier in the letrec block) that will produce structures whose lifetime is 
definitely contained by that of the control region. These ports are then sorted by the size of the 
sets of labels to which they may be bound. Any port whose label set contains another port’s label 
is discarded. This process is repeated until we are left with a set of ports whose label sets are 
disjoint. The compiler then inserts deallocation commands on each of these ports. 


Any time the compiler inserts a deallocation command, it informs the programmer where the deal- 
location command was inserted. If the components of a structure can be deallocated, the compiler 
will insert selection and deallocation code for these elements. The compiler has special cases for in- 
serting code to deallocate arrays and their components (shared or unshared) and to deallocate lists 
recursively (cyclic or acyclic). The current implementation does not insert conditional deallocation 
commands. 


In addition to the two modules that implement the deallocation verification and insertion algo- 
rithms, there is a module that generates code to allocate structures in activation frames rather 
than the heap. This module finds structures of static size that are allocated and deallocated within 
the same control region and changes them to be frame allocated. Restricting this module to apply 
only to structures allocated and deallocated within the same control region — rather than within 
the same procedure — limits its usefulness slightly. Nevertheless, this module is fairly effective at 
converting general allocation and deallocation code into frame-based allocation and deallocation 
code. The restriction that the sizes of frame-allocated objects must be known at compile-time is 
imposed by the Id Run Time System (Id-RTS) [41] on the Monsoon dataflow machine [36] which 
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must know the complete size of an activation frame before a procedure is called. We discuss the 
effectiveness of this optimization in Section 10.3. 


10.1.2 Monsoon 


Monsoon [36] is a dataflow machine with an explicit token store. Instead of using a hashing 
function to match the token pairs associated with instruction instances, each instruction has an 
explicit address (relative to an activation or frame pointer) where operand matching occurs. 


All of the experiments described in this chapter were run on a configuration of Monsoon hardware 
consisting of one processor and one I-structure unit. Each monsoon processing element (PE) 
contains 256K 32-bit words of instruction memory, 256K 64-bit words of data memory used for 
activation frames, and 256K element token queues. The processor consists of an eight stage pipeline 
operating at 10 MHz. Eight different threads of computation are interleaved in the pipeline. 


Each I-structure (1S) unit consists of 4M 64-bit words of data memory. Each word of data memory 
on both the PE and IS boards has an associated 3 presence-bits and 8 type-bits. The presence bits 
indicate whether a word of memory is empty or present and are the basic mechanism for fine-grain 
synchronization on Monsoon. The presence bits in activation frames are used for operand matching 
while the presence bits in heap memory are used to implement I-structure semantics. 


Monsoon is heavily instrumented. Each processor has a statistics processor, containing 64 statistics 
registers, that counts on a cycle-by-cycle basis what type of operations were executed and to what 
group of procedures those operations belonged. One of these counters is incremented every cycle. 
The counters are divided into 8 banks of 8 counters. The counter to be incremented is determined 
by the operation type and a 3-bit color field from a executing token’s continuation. For most 
operations, the 3-bit color field is used to choose one of the first 7 banks of counters, and the 
operation type is used to choose one of the 8 counters in the chosen bank. Events such as idle 
pipeline cycles are counted in the last bank of 8 counters. 


These statistics counters allow us to measure the utilization of the machine very precisely. We can 
account for how much time is spent in the user’s program, how much is spent in the Run-Time 
System (RTS), and how much is spent with the processor idle. We use the statistics counters to 
measure the performance of our examples. 


10.1.3. Id Run-Time System on Monsoon 


The version of the run-time system that we used when running these experiments consisted of a 
frame manager and a heap manager. The frame manager uses a single free list to manage unused 
activation frames, and so it only allocates one size of activation frame. The run-time system is 
initialized so that this frame size is large enough for all procedures. 


The heap manager uses the quick-fit algorithm [43] to manage deallocated storage. This algorithm 
incurs one word of overhead for all objects that are allocated. This overhead is insignificant for large 
objects, but is significant for small objects such as cons cells. Under this management strategy, 
cons cells take three words apiece. 


All structures in Id are implemented as I-structures. Each word of an I-structure has presence-bits 
that indicate whether that word is empty or present. Stores cause the presence-bits of a word to go 
from empty to present as well as changing the value of the word. Fetches issued against an empty 
word defer until a value is stored in that word. 
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One of the duties of the heap manager is to clear the presence-bits of each word of memory to 
empty. The current Id RTS clears the presence-bits of all words of the heap during an initialization 
phase before a program is executed. During program execution, presence-bits are cleared whenever 
an object is deallocated. The heap manager maintains the invariant that all of the presence-bits of 
free memory are empty. 


In steady-state, when as many objects are being deallocated as are allocated, it does not matter 
whether presence-bits are cleared upon allocation or upon deallocation. However, if presence-bits 
are cleared upon deallocation, the difference in run-time between programs that reclaim storage 
and those that never reclaim storage can be significant — programs that do not reclaim storage 
are not charged for clearing the presence-bits of the I-structures that they allocate. Under this 
strategy, a program that does not reclaim storage will have better performance than one that does 
reclaim storage unless it runs out of memory. 


We find that most of our programs that allocate and deallocate approximately equal amounts of 
storage spend about half their time in the run-time system. Of the time spent executing run-time 
system code, half is spent clearing presence-bits, and the other half is spent manipulating the data 
structures that keep track of free and allocated storage. 


The activation frame and heap managers both contain code to record the maximum amount of 
storage that was allocated and the current amount of storage allocated. We use this code to gather 
statistics about the amount of storage used by our example programs. 


10.1.4 Structure of the Experiments 


For each program we studied, we determined storage usage and execution time without storage 
deallocation, and storage usage and execution time with the best hand-inserted deallocation. Then 
we recompiled the programs to verify the hand-inserted deallocation commands, recording the per- 
cent increase in compile-time and the static percentage of deallocation commands verified. We also 
recompiled the original programs to insert deallocation commands automatically, again recording 
the percent increase in compile-time and the static percentage of deallocation commands inserted. 
Finally, we ran the programs again to determine dynamic storage usage and execution time for 
the programs with verified deallocation commands only and automatically inserted deallocation 
commands only. 


10.2. Performance Measurements 


This section describes the compile-time performance of our implementation of the verification and 
insertion algorithms. It also describes the run-time performance of the various versions of the com- 
piled code. The first example described is the Wavefront benchmark. Wavefront is an example we 
use to illustrate the use of non-strictness in the definition of relaxation programs. The second exam- 
ple described is the Simple hydrodynamics benchmark. Both Wavefront and Simple are programs 
with very static structure. Both of these programs use arrays as their major data structure. The 
third example described is the Gamteb benchmark. This example has a more dynamic structure, 
because the heart of the simulation is a set of 7 mutually recursive procedures. Gamteb allocates 
a large number of tuples as it simulates the trajectories of photons in a carbon rod. 
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def multiwave edge_vector n = 
{m = initial_wave edge_vector ; 
r= 
{for i <- 1 ton do 
next m = wave m ; 
finally m } 
in r }; 


Figure 10.1: The code for multiwave 


def multiwave edge_vector n = 
{m = initial_wave edge_vector ; 
r= 
{for i <- 1 ton do 
next m = wave m ; 


Dealloc(m) ; 
finally m } 


_ = if (1 <= n) then Dealloc(m); 
in r }; 


Figure 10.2: The annotated code for multiwave 


10.2.1 The Wavefront Benchmark 


The Wavefront benchmark is a simple example used to test automatic storage reclamation. The 
outer loop of the example is shown in Figure 10.1. Procedure 
initial_wave allocates a matrix, and each iteration procedure wave reads matrix m and creates a 
new matrix. The matrix passed into each iteration of the loop is garbage upon termination of that 
iteration. The analyzer correctly determines this and allows the compiler to generate the code in 
Figure 10.2. 


We can reclaim the storage associated with the value of initial_wave whenever the loop executes 
at least once. 


The following table contains the compile-times for the Wavefront benchmark. The four versions of 
the program are Wavefronty 4, Wavefronty4, Wavefrontyr and Wavefront44. Wavefront, 4 is the 
original version, without any deallocation commands. This program was compiled by the unmod- 
ified Id compiler. Wavefronty, is a hand-annotated version that contains deallocation commands 
that were inserted manually. It was also compiled with the unmodified Id compiler. Wavefrontyp 
is the hand-annotated version as compiled by the Id compiler with the lifetime analysis and deal- 
location verification module. All unsafe deallocation commands are removed by the compiler. 
Wavefront 4,4 is the unannotated version of the Wavefront program compiled with both the lifetime 
analysis and deallocation insertion modules. The number of deallocation commands is a static 
count of all of the deallocation commands in the program. 
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Program Compile-Time | Deallocs 
(seconds) | (number) 


Wavefront y 4 18 0 
f 
Wavefronty Fr 32 2 
: 
The hand-annotated version contains three deallocation commands to deallocate the edge-vector, 
the first matrix, and each intermediate matrix. The compiler-verified and compiler annotated 
version contain two deallocations: one for the edge vector, and one for the intermediate matrices. 


The compiler cannot determine that the first matrix will not be returned as the result, so it cannot 
insert code to deallocate that matrix. A programmer can insert conditionals to prevent error in 


this case. 


The following table describes the run-time performance of the four versions of Wavefront. Each 
program was run 40 iterations on a 30 x 30 matrix. The table gives the total run-time for each 
program, as well as the maximum amount of storage that was allocated, in words, and the final 
number of words of storage that were still allocated when the programs terminated. 


Program Run-Time | Max Storage | Final Storage 
(seconds) (words) (words) 


Wavefront y 4 0.193 37,225 37,225 
Wavefronty r 0.336 10,000 1814 


The original version of this program runs the fastest, but it also uses the most storage. The hand 
annotated version takes 81% longer. However, it deallocates all but the final matrix. The main 
reason the versions containing deallocation code take longer to execute is because the deallocation 
code must clear the presence-bits of the objects being deallocated. 


The compiler-verified and compiler-annotated versions deallocate all but the first and last matrices. 
Deallocation of the first matrix cannot be verified, because if we execute zero iterations, the first 
matrix is returned as the result, and the compiler cannot prove that we execute more than zero 
iterations. We discuss this problem in more detail in Section 10.4. 


10.2.2 Simple 


Simple, a hydrodynamics benchmark program [13], is a scientific program with very simple control 
structure. If compiler-directed storage reclamation is going to have any success, it should be able to 
reclaim every intermediate structure allocated in this program. In fact, our first implementation of 
the program annotator, which did not handle nested structures, had very good success on Simple. 
It inserted Dealloc statements that deallocated seventy percent (dynamically) of the structures 
allocated by the program at run-time. Unfortunately, these were tuples that contained numbers of 
large matrices, and so this was a small fraction — only thirty percent for problem size of ten by 
ten — of the total storage allocated. 


The following table contains the compile-times for the Simple benchmark under four conditions: 
not annotated (NA), hand annotated (HA), verified safe deallocation commands only (V F’) and 
automatically annotated (AA). 
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Program | Compile-Time | Deallocs 
(seconds) | (number) 


Simpley 4 409 0 

Simpley 863 58 
Compilation of the hand-annotated version of Simple took slightly longer than the original version, 
while compilation with the lifetime analyzer and deallocation command verifier or inserter turned 


took twice as long as compilation of the original program. 


The twelve deallocation commands (70 — 58) that could not be verified as safe were all potentially 
unsafe because they deallocated structures that may escape if a loop executed zero iterations. These 
deallocation commands in version H A are actually safe, because the loop never executes fewer than 
one iteration. 


The following table contains information about the run-time performance of the four versions of 
Simple. Each version was run twice: once for 20 iterations of a 50 x 50 matrix, and once for 40 
iterations of a 50 x 50 matrix. 


Program | Size | Iters snd) Max Prone) Final Prone) 
en Oe TE Teco | Goo ay 
Simpley 4 1 = = 1 ~ = 
Simpley 4 . 3,324,447 3,324,447 


Simple 4 114,147 40,941 


Simpley 4 . 114,147 40,941 
Simpley . 114,147 
Simpley . 114,147 
Simple, 4 . 114,147 
Simple, 4 . 114,147 


Each version that contains deallocation commands took about 33% longer to run than the version 
that had no deallocation commands. However, these each deallocated 93% to 97% of the storage 
that they allocated. Each of the three versions containing deallocation commands reclaims all of 
the storage allocated during each iteration. The only difference in the amount of storage that they 
use is in how much of the storage allocated for initial data structures is eventually reclaimed. 


10.2.3. Gamteb 


Gamteb [8], a Monte Carlo simulation of photon transport in a graphite rod, is another scientific 
program on which this system should have good success. The Id version has a slightly more complex 
structure than the original Fortran: the Id version uses a recursive procedure to simulate particle 
transport. This recursive procedure is called from a parallel outer loop. Each recursive procedure 
is called with a new particle and returns a new tuple of counts. The particle tuples passed in can 
be deallocated upon termination of the recursive call, and the count tuples returned as the result 
of the recursive call are read and may be deallocated upon termination of each invocation of the 
outer loop. 


A version of Gamteb with hand-inserted deallocation commands contained 38 deallocation com- 
mands. The compiler verifies the safety of 37 of these deallocation commands. The compiler fails 
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{ def frame_tuple(x,y) = 
Mf £#t = ™MakeTuple(x,y) 
result = g(ft); 
in “result }; 


def g(t) = 
keg x = '7Select,(t) 


in *sr }; 


in frame_tuple(68,47) } 


Figure 10.3: Frame allocated tuple example 


to verify one deallocation command that reclaims a structure that may be passed through a zero- 
tripping loop. The compiler can insert 35 deallocation commands. It fails to insert two deallocation 
commands that reclaim structures that may be passed through zero-tripping loops. 


The following table contains the compilation times for the four versions of Gamteb. 


Program Compile-Time | Deallocs 
(seconds) | (number) 


Gamteby 4 158 0 
Gamtebyy 4 183 36 
Gamtebyr 976 34 
Gamteb 44 980 34 
The following table contains information about the run-time performance of the four versions of 
Gamteb. 
Program N | Run-Time | Max Storage | Final Storage 
(seconds) (words) (words) 
Gamteby 4 982,315 982,315 


Gamteby 4 1,839,952 1,839,952 
Gamteby 4 4710 132 


Gamteby 4 5100 132 
Gamtebyr 
Gamtebyr 
Gamteb 44 
Gamteb 44 


10.3. Transformation to Frame Allocation 


When the compiler finds a structure that is allocated and deallocated in the same control region, it 
can transform the heap allocation into frame allocation. Deallocation of the structure then happens 
automatically when the procedure exits. In other words, the compiler sets aside enough storage in 
the activation frame of the procedure to contain the structure. In some implementations, such as 
the implementation of Id on Monsoon, this is only possible if the structure size is known statically. 
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{ def frame_tuple(x,y) = 
i{ ¢t = MakeFrameTuple(x,y) 
result = g(ft); 
_ = CleanupFrameTuple (ft) ; 
in “result }; 


def g(t) = 
keg ry = '7Select,(t) 
in *sr }; 


in "°frame_tuple (68,47) } 
Figure 10.4: Frame allocated tuple example with transformation 


In other implementations, where activations frames are stack allocated, the procedure may be able 
to dynamically allocate space in its activation frame by adjusting its stack pointer. 


Procedure frame_tuple shown in Figure 10.3 contains a tuple bound to identifier ft that may be 
frame allocated, because the structure allocated by expression kz in procedure frame_tuple does 
not escape from the invocation of frame_tuple. 


Figure 10.4 contains the transformed code for this example. The primitive 
MakeFrameTuple allocates a tuple in the frame. The semantics of the tuple is exactly the same as for 
a heap-allocated tuple, except that the storage is automatically reclaimed upon termination of the 
procedure frame_tuple. The primitive CleanupFrameTuple performs any cleanup required by the 
run-time system. The Id run-time system requires that all frames be empty when returned, and so 
CleanupFrameTuple clears out the storage used by the tuple. 


The following table summarizes the results when we compile the hand annotated version of Gamteb 
with the frame allocation optimization enabled: 


Program Compile-Time | Deallocs 
(seconds) | (number) 


Gamtebyy 4 183 38 

Gamtebrara 183 38 
The following table contains information about the run-time performance of the Gamteb benchmark 
compiled with the frame allocation optimization enabled. 


Program N | Run-Time | Max Storage | Final Storage 
(seconds) (words) (words) 


Gamteby 4 1000 4710 132 
Gamtebyara | 1000 3510 132 
Gamtebyara | 2000 3500 132 


The version of Gamteb that uses frame allocation runs 13% faster than the original version, and uses 
less total storage. The optimization itself is very straightforward and does not increase compile-time 
noticeably. 
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10.4 Handling Possibly Zero-Tripping Loops 


A common idiom in functional implementations of scientific programs is a structure that is created 
and then successively refined in a loop or tail recursion. Often, only the final value is needed, and 
the initial value and all intermediate values can be reclaimed. However, if the compiler cannot 
determine that the loop will execute at least once, then it cannot tell that the final value could not 
be the initial value, and the initial value will never be reclaimed by the compiler. 


Here is such an example: 


def multiwave ev k = 
{ M = initial_wave ev; 
in {for i <- 1 to k do 
next M = wave M; 
finally M }}; 


The initial value of M, allocated by initial_wave, will be returned as the final value of the loop if 
the value of k is less than one. 


We can provide run-time checking to ensure that the initial matrix is only deallocated if it is not 
returned as the result by testing the initial value of the loop predicate. The following code has this 
transformation. 


def multiwave ev k = 
{ M = initial_wave ev; 
r = {for i <- 1 to k do 
next M = wave M; 
finally M } 
_= if k > O then deallocate M; 
in r }; 


The code after the barrier deallocates the initial copy of M if k is at least one. In Wavefront, 
this optimization only reclaims one object, so it is not very interesting. We applied the same 
optimization with much more spectacular results. 


The following table summarizes the performance of Gamteb when it is compiled with the zero- 
tripping optimization turned on. The compile-time for the row labeled Gamtebzr includes the 
time to perform the zero-tripping optimization. The compile-time of Gamtebzrr,4 includes both 
the zero-tripping detection and frame-allocation optimizations. 


Program Compile-Time | Deallocs 
(seconds) | (number) 


Gamteb 44 980 34 
Gamtebzr 980 36 
Gamtebzrra 981 36 


This optimization takes very little time, but allows the compiler to add two more deallocation 
commands to Gamteb than it could without the optimization. These two deallocation commands, as 
we can see from the following table, reduce the storage used by Gamteb considerably. Furthermore, 
once these two deallocation commands have been added, Gamteb uses a constant amount of storage 
for any number of particles simulated. 
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Program N | Run-Time | Max Storage | Final Storage 
(seconds) (words) (words) 
Gamteby 4 132 


Gamteb 44 . 48098 
Gamteb 44 . 89398 


Gamtebzr . 132 
Gamtebzr . 132 
Gamtebzrra . 132 
Gamtebzrra . 132 


These performance results show that the zero-tripping loop optimization is very important, even 
though it only inserts code to deallocate one structure per loop. 


We did similar experiments with Simple to see what difference it made to reclaim the storage from 
structures that may be returned as the result of a loop. The following table shows the compile 
times and the number of deallocations inserted. The ZT version of Simple is compiled with the ZT 
transformation, which inserts twelve additional deallocation commands. 


Program | Compile-Time | Deallocs 
Cae ON pcs) | Gunter 
Simple, 4 1100 58 


The following table summarizes the results of running the HA, AA, and ZT versions of Simple on 
a 50 x 50 problem size for 20 and 40 iterations. 


Program | Size | Iters snd) Max Prone) Final Prone) 
we 

Simpley 4 “ “) 

Simpley 4 114,147 


Simple4a . 114,147 
Simple4a . 114,147 
Simplezr . 114,147 
Simplezr . 114,147 


Use of the ZT transformation allows the compiler-generated deallocation commands to reclaim as 
much storage as the hand-generated deallocation commands do. 


10.5 Examples Using Lists 


This section describes the experiments we did with list manipulating programs. The first example, 
shown below, creates a list named /1 containing len integers. It then creates a list named /2 by 
incrementing each element in /1 by nl. It creates another list (3 by scaling each element in /2 by 
n2. Finally, it returns the sum of the elements of list /3. 


def test len ni n2 = 
{ 11 = gen_list len; 
12 = inc_list ni 11; 
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13 = scale_list n2 12; 
r = sum_list 13; 
in r }; 


The three list generating procedure gen_list, inc_list, and scale_list were written using list 
comprehensions. In Id, a list comprehension is syntactic sugar that expands into a loop expression 
that generates a list. List comprehensions tend to make list manipulating programs more compact. 


The Id compiler inserts code that allocates one extra cons cell for each list comprehension. The 
extra cell simplifies the code that constructs the list, because it eliminates the extra testing that 
would be needed otherwise when generating an empty list. The lifetime of the extra cons cell 
is always bounded by the control region enclosing the list comprehension, but the standard Id 
compiler does not currently insert deallocation code for this extra cell. 


The following table shows the compile-time performance of three versions of this program: no anno- 
tations inserted (NA), hand inserted deallocation commands (HA), and automatically annotated 
(AA). The compiler could not verify any of the hand inserted deallocation commands because they 
are contained in procedures and violate the safety condition that we defined in Chapter 5. The 
compiler has special cases for inserting code to deallocate lists, and these were used to generate the 


automatically annotated version of the benchmark. 


Program | Compile-Time | # Dealloc | # Deallocate_List 
(seconds) 


11 0 0 
15 0 3 
26 3 3 


The hand annotated version of the List benchmark contains three calls to the procedure Deallocate_List, 
which deallocates all cells of a list. This procedure assumes that the list is acyclic. The hand an- 
notated version does not deallocate the extra cons cells allocated by the list comprehension code 
because there is no way to name these cells in the Id source code. The compiler annotated version 

of the List benchmark contains three Dealloc commands to reclaim extra cons cells allocated by 

the list comprehension code, as well as three calls to Deallocate_Cyclic_List, which deallocates 

all unique cells in alist. The compiler cannot determine that a list is acyclic, and so it inserts code 

that safely deallocates both cyclic and acyclic lists. 


The following table contains information about the run-time performance of the three versions of 
the list manipulating benchmark. 


Program | Length | Run-Time | Max Storage | Final Storage 
Pen LISD [condo | (vert [on 

1000 9009 9009 

100,000 900,009 900,009 


1000 9009 9 


100,000 900,009 9 
1000 9006 0 
100,000 900,006 0 


Both versions of this benchmark that deallocate storage take more than twice as long as the original 
code. The compiler annotated version of this benchmark uses the least amount of storage, but takes 
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the longest because the code to deallocate a potentially cyclic list is more expensive than the code 
to deallocate an acyclic list. The automatically annotated version has a lower maximum storage 
because the deallocation of one of the extra cons cells was allocated before the both of the others 
were allocated. 


10.6 Explicit Storage Reuse 


If the compiler finds a structure that is allocated in each iteration of a loop and deallocated in the 
following iteration, then the compiler can lift both the allocate and the deallocate out of the loop 
and explicitly reuse the structure. In some cases the compiler may have to allocate two or more 
structures outside of the loop and cycle through them. 


Consider the following example, where M is a matrix that is successively relaxed. In each iteration, 
a new version of M is created and an old one becomes garbage. Furthermore, the loop is bounded 
by parameter k — this allows up to k iterations of the loop to execute in parallel. Therefore, the 
space used by the loop should be bounded by k times the space requirements of a single iteration. 


def relax M size n_steps = 
{for i <- 1 to n_steps bound k do 
next M = {matrix (1,size),(1i,size) of 
| Li,j] = relax_point Mi j 
|| i <- 1 to size & j <- 1 to size }; 
Dealloc(M) 
finally M }; 


Although this version of the procedure reclaims all intermediate storage allocated, it calls the heap 
manager n_steps times to allocate storage and n_steps times to deallocate storage. We only ever 
need k instances of the matrix M at any point in time, and so we should be able to locally manage 
the storage in order to reduce the burden on the heap manager. We would like to specialize storage 
management whenever possible to increase the efficiency for particular uses of storage. 


The previous procedure definition can be transformed into the following code in order to reduce 
the overhead of storage management. 


def relax M size n_steps = 
{ Ms = make_k_matrices ((1,size),(1i,size)) k; 

R = {for i <- 1 to n_steps bound k do 
next M = Ms! [i mod k]; 
_ = {for i <- 1 to size do 

{for j <- 1 to size do 
next MLi,j] = relax_point M i j #3}; 
Ms!(i mod k] = clear_matrix M; 
finally M }; 
_ = free_k_matrices Ms; 
in R }; 
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The procedure make_k_matrices takes the dimensions b of the matrix and the loop bound k and 
returns an M-vector [7, 6] containing k empty matrices each with dimensions b. Each iteration, 
the (i mod k)th element of the vector of empty matrices is taken and used as the value of next 
M. Upon termination of the ith iteration, the current value of M is cleared and put back into the 
(i mod k)th of the vector of empty matrices. The vector of empty matrices and all of the empty 
matrices are deallocated upon termination of the whole loop by the call to free_k_matrices. 


This optimization is not currently implemented, but we expect it to be effective in reducing the 
run-time overhead of allocating and deallocating storage. 


Chapter 11 


Conclusion 


We have presented a method for performing object lifetime analysis on non-strict, parallel pro- 
grams. We have shown how to use this lifetime information to verify the correctness of deallocation 
commands in programs and to insert deallocation commands into programs. The central idea of 
this work is recognizing that object lifetimes can be derived from reachability information, and that 
interpreters can determine what objects are reachable from any point in the program. 


The crux of the analysis is the naming of objects. Object names must be related to program 
structure so that dynamic behavior can be related to the static structure of a program. Once we have 
realized that, it is straightforward to derive an abstract interpretation that yields a summarization 
of object reachability. We have presented an operational semantics that derives object names from 
the dynamic structure of a program’s call tree. We discussed several abstractions of this naming 
scheme that allow us to model the allocation and connectivity of objects with varying degrees of 
precision. 


The technique of using abstract interpretation to derive an analysis method from the semantics of a 
programming language shows great promise. The lifetime analysis presented in this thesis is precise 
enough to yield great reductions in the usage of storage in many non-trivial scientific applications. 
Our experiments showed that deallocation code inserted by the compiler could reclaim eighty to one 
hundred percent of the storage allocated by a program. While we do not claim that compilers will 
have this level of effectiveness for all programs, we do claim that there is a large class of programs 
for which these methods are very effective. 


11.1 Further Research 


This thesis is by no means the last word in lifetime analysis. We have taken another step by defining 
a lifetime analysis framework for non-strict, parallel languages, but a number of issues remain to 
be investigated. 


11.1.1 Computing Object Lifetimes 


The algorithm that we described to compute function environments in the abstract interpreter 
is very straightforward, but not necessarily very efficient. The process of computing function 
environments needs to be as efficient as possible if abstract interpretation is going to be a practical 
tool. 
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11.1.2 Subscript Analysis 


The interaction of subscript analysis and abstract interpretation is an area that could be explored 
further. Can we do better analysis of programs using arrays if we can determine that certain 
arrays have distinct subregions with potentially different behaviors? For instance, in some scientific 
programs, arrays are created where all of the border elements are shared and all of the inner 
elements are unique. If we could use subscript analysis to distinguish these regions during abstract 
interpretation, we might be able to determine that all of the interior elements could be deallocated 
without having to test for uniqueness. 


11.1.3. Determining Acyclicity of Recursive Objects 


We feel that, by modifying our abstract interpreter, we should be able to perform sharing analysis 
of recursively-typed objects. The goal of this sharing analysis would be to annotate recursively 
typed object representations to indicate whether they form trees, acyclic graphs, or cyclic graphs. 
This information should allow us to distinguish between objects that are definitely trees, objects 
that are definitely acyclic and objects that may be cyclic. This information would be useful because 
the compiler can generate more efficient code to reclaim trees and lists than to reclaim graphs and 
cyclic structures. 


Hendren [20] and Harrison [19] both can determine whether objects are acyclic using information 
about the allocation time of the nodes of a recursively typed object. They used this information 
to determine when statements or subexpressions could be executed in parallel. However, their 
methods depend on having a sequential interpreter, so the methods do not apply to our work. 


The insight we had that allowed us to collect sharing information for the elements of arrays created 
with MakeArray should carry over to recursive objects: the MakeArray construct provides a good 
encapsulation of the expression evaluated to obtain the elements of an array. We can determine if 
sharing is possible by observing the boundary of the encapsulation and seeing if any objects cross 
it, or are inherited. The values that cross the boundary may be shared by the different elements of 
the array. 


Basically, we need to unfold a recursive function once during analysis to determine if the recursive 
calls to the function can share values with the initial call to the function. If there is no sharing 
between the initial call and the recursive calls, then there can be no sharing between any of the 
calls because each of the recursive calls can be considered to be an initial call. Unfortunately, we 
have not seen how to formalize this condition in such a way that it can be included in our lifetime 
analysis method. If we proceed to unfold every recursive call once during abstract interpretation, 
then abstract interpretation will not terminate. Every iteration of the computation of the function 
environment will yield one more input value to which the recursive function must be applied. 


Lent[30] explored the selective unfolding of recursive procedure calls to determine acyclicity of lists. 
He proposed a special mechanism for unfolding function calls one extra time using renamed labels 
and then collapsing the renamed values back into the original domain. This extra level of labels 
should allow us to detect sharing and to annotate the unshared objects, so that we can preserve the 
sharing information once the labels are recompressed. We would like to investigate this technique 
in more detail to determine if it is sound and to extend it beyond detecting acyclic lists to detecting 
tress or directed acyclic graphs. 


11.1. FURTHER RESEARCH 177 


11.1.4 Deallocating Complex Structures 


The problem of generating code to deallocate complex structures is related to the problem of 
determining the acyclicity and sharing of complex structures. The current implementation of the 
deallocation insertion algorithm in the Id compiler has a few special cases for inserting code to 
deallocate single cons cells, potentially cyclic lists, and acyclic lists. The problem of generating 
code to traverse and deallocate recursive objects is still open. The compiler may be able to generate 
a procedure for each type to deallocate complete objects of that type. The compiler could then 
compose these special deallocation procedures to deallocate objects consisting of the composition 
of several types of objects. 


The problem of deallocating nested or recursive structures is exacerbated when the pattern of 
sharing within the structure is complex or unknown. Perhaps the run time system could provide a 
function that recursively descends a structure and deallocates all unique objects in that structure. 


11.1.5 Interaction with Garbage Collection 


Another area that deserves more attention is the interaction of explicit storage management with 
garbage collection. Is it really possible for the two to coexist such that the use of explicit deallocation 
commands decreases the overhead of garbage collection? One approach that we think is worth 
considering is having the compiler generate code to allocate storage in an area separate from the 
garbage collected heap. This code can explicitly deallocate the whole area when the objects in it 
are all dead. 


Another possibility is to have a dynamic storage manager and a garbage collector that coexist in 
one space. Explicit deallocation commands can be used to deallocate storage. Whatever storage is 
not deallocated explicitly will eventually be deallocated by the garbage collector. 


11.1.6 M-Structures 


Full-fledged Id and KID both have M-structures [7, 6], which are useful when writing programs 
that compute histograms, implement graph algorithms, or implement run-time system code. M- 
structures are mutable structures that allow mutually exclusive access to each word. 


We would like to see our instrumented and abstract interpreters augmented to handle programs 
using M-structures. We believe that M-structures can be modeled safely in our abstract interpreter 
in the same fashion as I-Structures. But our solution for modeling abstract M-structures does not 
solve the problem of modeling M-structures in the instrumented interpreter. It seems that the store 
would have to be threaded through the interpreter in order for the interpreter to model mutually 
exclusive access to each M-structure element. We would like to find a solution to this problem that 
does not obscure the parallelism of the interpreter. 


Once M-Structures are added to KID~ we will have to model barriers in full generality, which 
may involve computing a graph of activation label precedence. This precedence relation would be 
analogous to our terminates before relation. Once we have computed the precedence graph, we 
may be able to determine in some cases whether programs deadlock. 
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11.2 Other Research Directions 


Other semantic analyses are useful for a wide variety of reasons. Strictness analysis is helpful 
in determining that portions of a program may be sequentialized. Sequentialization is a useful 
optimization for compiling non-strict languages because it eliminates redundant synchronization. 
Dependence analysis and interference analysis are also important analyses in the field of ptimizing 
compilers.. 


The abstract interpretation framework presented in this thesis is a sound basis for a wide variety 
of other such analyses of non-strict or parallel programs. By changing the abstract evaluators and 
value domains presented in this report, the abstract interpreter can be restructured to support 
these other data dependent analysis methods. 
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